Prepare the compute cluster
- When creating the compute cluster, select Unrestricted or Shared Compute policy. To run this notebook on a Shared Compute cluster, you must select Databricks Runtime for ML 11.3 LTS or above.
- After creating the cluster, follow these steps to install the latest Azure Cosmos DB connector for Spark 3.2:
- Navigate to the compute cluster and click the Libraries tab.
- Click Install new.
- Click Maven and enter the coordinates of the latest version. For example:
com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12:4.17.2
- Click Install.
- Attach this notebook to the cluster.
Prepare the feature table
Suppose you need to build an endpoint to predict wine quality with just the wine_id
. There has to be a feature table saved in Feature Store where the endpoint can look up features of the wine by the wine_id
. For the purpose of this demo, we need to prepare this feature table ourselves first. The steps are:
- Load and clean the raw data.
- Separate features and labels.
- Save features into a feature table.
Set up Cosmos DB credentials
In this section, you need to take some manual steps to make Cosmos DB accessible to this notebook. Databricks needs permission to create and update Cosmos DB containers so that Cosmos DB can work with Feature Store. The following steps stores Cosmos DB keys in Databricks Secrets.
Look up the keys for Cosmos DB
- Go to Azure portal at https://portal.azure.com/
- Search and open "Cosmos DB", then create or select an account.
- Navigate to "keys" the view the URI and credentials.
Provide online store credentials using Databricks secrets
Note: For simplicity, the commands below use predefined names for the scope and secrets. To choose your own scope and secret names, follow the process in the Databricks documentation.
Create two secret scopes in Databricks.
databricks secrets create-scope --scope feature-store-example-read databricks secrets create-scope --scope feature-store-example-write
Create secrets in the scopes.
Note: the keys should follow the format<prefix>-authorization-key
. For simplicity, these commands use predefined names here. When the commands run, you will be prompted to copy your secrets into an editor.databricks secrets put --scope feature-store-example-read --key cosmos-authorization-key databricks secrets put --scope feature-store-example-write --key cosmos-authorization-key
Now the credentials are stored with Databricks Secrets. You will use them below to create the online store.
Publish the features to the online feature store
This allows Feature Store to add a lineage information about the feature table and the online storage. So when the model serves real-time queries, it can lookup features from the online store for better performance.
Note: You must use publish_table()
to create the table in the online store. Do not manually create a database or container inside Cosmos DB. publish_table()
does that for you automatically. If you create a table without using publish_table()
, the schema might be incompatible and the write command will fail.
Serve realtime queries with automatic feature lookup
After calling log_model
, a new version of the model is saved. To provision a serving endpoint, follow the steps below.
- Click Serving under Machine Learning in the left sidebar.
- Create a serving endpoint with the model named "wine_quality_classifier". See the Databricks documentation for details.
Here is an example of the request format:
{"dataframe_split": {"index": [0, 1, 2], "columns": ["wine_id", "alcohol"], "data": [[25, 7.9], [25, 11.0], [25, 27.9]]}}
Learn more about Databricks Model Serving.
Clean up
Follow this checklist to clean up the resources created by this notebook:
- Azure Cosmos DB Container
- Go to Azure console and navigate to Cosmos DB.
- Delete the container
feature_store_online_wine_features
- Secrets store on Databricks Secrets
databricks secrets delete-scope --scope <scope-name>
- Databricks access token
- Click your username at the top right. Then select User Settings > Access tokens and delete the token.
Online Feature Store example notebook
This notebook illustrates the use of Databricks Feature Store to publish features to an online store for real-time serving and automated feature lookup. The problem is to predict the wine quality using a ML model with a variety of static wine features and a realtime input.
This notebook creates an endpoint to predict the quality of a bottle of wine, given an ID and the realtime feature alcohol by volume (ABV).
The notebook is structured as follows:
Data Set
This example uses the Wine Quality Data Set.
Requirements