feature-store-with-uc-online-example-dynamodb(Python)

Loading...

Online Feature Store example with Unity Catalog

This notebook illustrates the use of Databricks Feature Engineering in Unity Catalog to publish features to an online store for real-time serving and automated feature lookup. The problem is to predict the wine quality using a ML model with a variety of static wine features and a realtime input.

This notebook creates an endpoint to predict the quality of a bottle of wine, given an ID and the realtime feature alcohol by volume (ABV).

The notebook is structured as follows:

  1. Prepare the feature table
  2. Set up DynamoDB
    • This notebook uses DynamoDB. For a list of supported online stores, see the Databricks documentation.
  3. Publish the Unity Catalog features to online feature store
  4. Train and deploy the model
  5. Serve realtime queries with automatic feature lookup
  6. Clean up

Data Set

This example uses the Wine Quality Data Set.

Requirements

  • Databricks Runtime 13.3 LTS for Machine Learning or above.

    • If you do not have access to Databricks Runtime for Machine Learning, you can run this notebook on Databricks Runtime 13.3 LTS or above. To do so, run %pip install databricks-feature-engineering at the start of this notebook.
  • Access to AWS DynamoDB

    • This notebook uses DynamoDB as the online store and guides you through how to generate secrets and register them with Databricks Secret Management.

Prepare the feature table

Suppose you need to build an endpoint to predict wine quality with just the wine_id. There has to be a feature table saved in Feature Engineering in Unity Catalog where the endpoint can look up features of the wine by the wine_id. For the purpose of this demo, we need to prepare this feature table ourselves first. The steps are:

  1. Load and clean the raw data.
  2. Separate features and labels.
  3. Save features into a feature table.

Load and clean the raw data

The raw data contains 12 columns including 11 features and the quality column. The quality column is an integer that ranges from 3 to 8. The goal is to build a model that predicts the quality value.

There are some problems with the raw data:

  1. The column names contain space (' '), which is not compatible with Feature Engineering in Unity Catalog.
  2. We need to add ID to the raw data so they can be looked up later by Feature Engineering in Unity Catalog.

The following cell addresses these issues.

Let's assume that the alcohol by volume (ABV) is a variable that changes over time after the wine is opened. The value will be provided as a realtime input in online inference.

Now, split the data into two parts and store only the part with static features to Feature Engineering in Unity Catalog.

Create a feature table

Next, create a new catalog or reuse an existing one and create the schema where the feature tables will be stored.

  • To create a new catalog, you must have the CREATE CATALOG privilege on the metastore.
  • To use an existing catalog, you must have the USE CATALOG privilege on the catalog.
  • To create a new schema in the catalog, you must have the CREATE SCHEMA privilege on the catalog.

Save the feature data id_static_features into a feature table.

The feature data has been stored into the feature table. The next step is to set up access to AWS DynamoDB.

Set up DynamoDB Access Key

In this section, you need to take some manual steps to make DynamoDB accessible to this notebook. Databricks creates and updates DynamoDB tables so that DynamoDB can work with Feature Engineering in Unity Catalog. The following steps create a new AWS IAM user with the required permissions. You can also choose to use your existing users or roles.

Create an AWS IAM user and download secrets

  1. Go to AWS console http://console.aws.amazon.com, navigate to IAM, and click "Users".
  2. Click "Add users" and create a new user with "Access Key".
  3. Click Next and select policy AmazonDynamoDBFullAccess.
  4. Click Next until the user is created.
  5. Download the "Access key ID" and "Secret access key".

Provide online store credentials using Databricks secrets

Note: For simplicity, the commands below use predefined names for the scope and secrets. To choose your own scope and secret names, follow the process in the Databricks documentation.

  1. Create two secret scopes in Databricks.

    databricks secrets create-scope --scope feature-store-example-read
    databricks secrets create-scope --scope feature-store-example-write
    
  2. Create secrets in the scopes. Note: the keys should follow the format <prefix>-access-key-id and <prefix>-secret-access-key respectively. Again, for simplicity, these commands use predefined names here. When the commands run, you will be prompted to copy your secrets into an editor.

    databricks secrets put --scope feature-store-example-read --key dynamo-access-key-id
    databricks secrets put --scope feature-store-example-read --key dynamo-secret-access-key
    
    databricks secrets put --scope feature-store-example-write --key dynamo-access-key-id
    databricks secrets put --scope feature-store-example-write --key dynamo-secret-access-key
    

Now the credentials are stored with Databricks Secrets. You will use them below to create the online store.

Publish the features to the online feature store

This allows Feature Engineering in Unity Catalog to add a lineage information about the feature table and the online storage. So when the model serves real-time queries, it can lookup features from the online store for better performance.

Note: You must use publish_table() to create the table in the online store. publish_table() creates the DynamoDB table in the online store. If you create a table without using publish_table(), the schema might be incompatible and the write command will fail.

Train and deploy the model

Now, you will train a classifier using features in Feature Engineering in Unity Catalog. You only need to specify the primary key, and Feature Engineering in Unity Catalog will fetch the required features.

First, define a TrainingSet. The training set accepts a feature_lookups list, where each item represents some features from a feature table in Feature Engineering in Unity Catalog. This example uses wine_id as the lookup key to fetch all the features from table ml.online_store_example.wine_static_features.

The next cell trains a RandomForestClassifier model.

Save the trained model using log_model. log_model also saves lineage information between the model and the features (through training_set). So, during serving, the model automatically knows where to fetch the features by just the lookup keys.

Serve realtime queries with automatic feature lookup

After calling log_model, a new version of the model is saved. To provision a serving endpoint, follow the steps below.

  1. Click Serving under Machine Learning in the left sidebar.
  2. Create a serving endpoint with the model named "wine_quality_classifier". See the Databricks documentation for details.

Send a query

In the Serving page, there are three approaches for calling the model. You can try the "Browser" approach with a JSON format request, as shown below. But here we copy-pasted the Python approach to illustrate an programatic way.

Now, suppose you opened a bottle of wine and you have a sensor to measure the current ABV from the bottle. Using the model and automated feature lookup with realtime serving, you can predict the quality of the wine using the measured ABV value as the realtime input "alcohol".

Notes on request format and API versions

Here is an example of the request format:

{"dataframe_split": {"index": [0, 1, 2], "columns": ["wine_id", "alcohol"], "data": [[25, 7.9], [25, 11.0], [25, 27.9]]}}

Learn more about Databricks Model Serving.

Clean up

Follow this checklist to clean up the resources created by this notebook:

  1. AWS DynamoDB Table
    • Go to AWS console and navigate to DynamoDB.
    • Delete the table feature_store_online_wine_features
  2. AWS user and access key
    • Go to AWS console and navigate to IAM.
    • Search and click on the newly created user.
    • Delete user or click "Make Inactive" on on the Access Key to disable the access.
  3. Secrets store on Databricks Secrets databricks secrets delete-scope --scope <scope-name>
  4. Databricks access token
    • From the Databricks left sidebar, "Settings" > "User Settings" > "Access Tokens"