Online Feature Store example with Unity Catalog

This notebook illustrates the use of Databricks Feature Engineering in Unity Catalog to publish features to an online store for real-time serving and automated feature lookup. The problem is to predict the wine quality using a ML model with a variety of static wine features and a realtime input.

This notebook creates an endpoint to predict the quality of a bottle of wine, given an ID and the realtime feature alcohol by volume (ABV).

The notebook is structured as follows:

Prepare the feature table
Set up Cosmos DB
- This notebook uses Cosmos DB. For a list of supported online stores, see the Databricks documentation.
Publish the features to online feature store
Train and deploy the model
Serve realtime queries with automatic feature lookup
Clean up

Data Set

This example uses the Wine Quality Data Set.

Requirements

Databricks Runtime 13.3 LTS for Machine Learning or above.
- If you do not have access to Databricks Runtime for Machine Learning, you can run this notebook on Databricks Runtime 13.3 LTS or above. To do so, run %pip install databricks-feature-engineering at the start of this notebook.
Access to Azure Cosmos DB.
- This notebook uses Cosmos DB as the online store and guides you through how to generate secrets and register them with Databricks Secret Management.
The cluster you are running must have the Azure Cosmos DB connector for Spark installed. See the instructions in the section Prepare the compute cluster.

Prepare the compute cluster

When creating the compute cluster, select Unrestricted or Shared Compute policy. To run this notebook on a Shared Compute cluster, you must select Databricks Runtime for ML 13.2 or above.
After creating the cluster, follow these steps to install the latest Azure Cosmos DB connector for Spark 3.3:
- Navigate to the compute cluster and click the Libraries tab.
- Click Install new.
- Click Maven and enter the coordinates of the latest version. For example: com.azure.cosmos.spark:azure-cosmos-spark_3-3_2-12:4.20.0
- Click Install.
Attach this notebook to the cluster.

Set up Cosmos DB credentials

In this section, you need to take some manual steps to make Cosmos DB accessible to this notebook. Databricks needs permission to create and update Cosmos DB containers so that Cosmos DB can work with Feature Engineering in Unity Catalog. The following steps stores Cosmos DB keys in Databricks Secrets.

Look up the keys for Cosmos DB

Go to Azure portal at https://portal.azure.com/
Search and open "Cosmos DB", then create or select an account.
Navigate to "keys" the view the URI and credentials.

Provide online store credentials using Databricks secrets

Note: For simplicity, the commands below use predefined names for the scope and secrets. To choose your own scope and secret names, follow the process in the Databricks documentation.

Create two secret scopes in Databricks.

databricks secrets create-scope --scope feature-store-example-read
databricks secrets create-scope --scope feature-store-example-write

Create secrets in the scopes. Note: the keys should follow the format <prefix>-authorization-key. For simplicity, these commands use predefined names here. When the commands run, you will be prompted to copy your secrets into an editor.
```
databricks secrets put --scope feature-store-example-read --key cosmos-authorization-key
databricks secrets put --scope feature-store-example-write --key cosmos-authorization-key
```

Now the credentials are stored with Databricks Secrets. You will use them below to create the online store.

Publish the features to the online feature store

This allows Feature Engineering in Unity Catalog to add a lineage information about the feature table and the online storage. So when the model serves real-time queries, it can lookup features from the online store for better performance.

Note: You must use publish_table() to create the table in the online store. Do not manually create a database or container inside Cosmos DB. publish_table() does that for you automatically. If you create a table without using publish_table(), the schema might be incompatible and the write command will fail.

Here is an example of the request format:

{"dataframe_split": {"index": [0, 1, 2], "columns": ["wine_id", "alcohol"], "data": [[25, 7.9], [25, 11.0], [25, 27.9]]}}

Learn more about Databricks Model Serving.

feature-store-with-uc-online-example-cosmosdb(Python)

Online Feature Store example with Unity Catalog

Data Set

Requirements

Prepare the compute cluster

Prepare the feature table

Load and clean the raw data

Create a feature table

Set up Cosmos DB credentials

Look up the keys for Cosmos DB

Provide online store credentials using Databricks secrets

Publish the features to the online feature store

Train and deploy the model

Serve realtime queries with automatic feature lookup

Send a query

Notes on request format and API versions

Clean up