feature-store-multi-workspace(Python)

Loading...

Centralized Feature Store example

In this notebook, you create a feature table in a remote Feature Store workspace (Workspace B). Then, working with this notebook in your local workspace, you use the feature table in Workspace B to train a model and register the model to a Model Registry in a different remote workspace (Workspace C).

Notebook setup

  1. In the remote workspace where the feature table will be created (Workspace B), create an access token.
  2. In the current workspace, create secrets and store the access token and the remote workspace information. The easiest way is to use the Databricks CLI, but you can also use the Secrets REST API.
  3. Create a secret scope: databricks secrets create-scope --scope <scope>.
  4. Pick a unique name for the remote workspace (Workspace B), shown here as <prefix>. Then create three secrets:
* `databricks secrets put --scope <scope> --key <prefix>-host`. Enter the hostname of the feature store workspace (workspace B).
* `databricks secrets put --scope <scope> --key <prefix>-token`. Enter the access token from the feature store workspace (workspace B).
* `databricks secrets put --scope <scope> --key <prefix>-workspace-id`. Enter the workspace ID for the feature store workspace (workspace B) which can be found in the URL of any page in the workspace.

Before you run this notebook, enter the secret scope and key prefix corresponding to the remote feature store workspace (Workspace B) in the notebook parameter fields above.

# Create the widgets
dbutils.widgets.text('feature_store_secret_scope', '')
dbutils.widgets.text('feature_store_secret_key_prefix', '')
 
dbutils.widgets.text('model_registry_secret_scope', '')
dbutils.widgets.text('model_registry_secret_key_prefix', '')
 
# Get the secret scope and key prefix for the remote Feature Store and Model Registry
fs_scope = str(dbutils.widgets.get('feature_store_secret_scope'))
fs_key = str(dbutils.widgets.get('feature_store_secret_key_prefix'))
 
mr_scope = str(dbutils.widgets.get('model_registry_secret_scope'))
mr_key = str(dbutils.widgets.get('model_registry_secret_key_prefix'))
 
# Create the URIs to use to work with the remote Feature Store and Model Registry
feature_store_uri = f'databricks://{fs_scope}:{fs_key}' if fs_scope and fs_key else None
model_registry_uri = f'databricks://{mr_scope}:{mr_key}' if mr_scope and mr_key else None

Feature Table setup

In this step, you create the database for the feature table and create a data frame, features_df, that will be used to create the remote feature table.

%sql
CREATE DATABASE IF NOT EXISTS feature_store_multi_workspace;
from pyspark.sql.types import (
    StructType,
    StructField,
    IntegerType,
    FloatType,
)
 
feature_table_name = "feature_store_multi_workspace.feature_table"
 
feature_table_schema = StructType(
    [
        StructField("user_id", IntegerType(), False),
        StructField("user_feature", FloatType(), True),
    ]
)
 
features_df = spark.createDataFrame(
    [
        (123, 100.2),
        (456, 12.4),
    ],
    feature_table_schema,
)

Create a remote feature table

In this step, you create the feature table in the remote workspace (Workspace B).

The API call to create a remote feature table depends on the version of Databricks Runtime for ML on your cluster.

  • With Databricks Runtime 10.2 ML or above, use FeatureStoreClient.create_table.
  • With Databricks Runtime 10.1 ML or below, use FeatureStoreClient.create_feature_table.
from databricks.feature_store import FeatureStoreClient
 
# When you create the FeatureStoreClient, specify the remote workspaces with the arguments feature_store_uri and model_registry_uri.
fs = FeatureStoreClient(feature_store_uri=feature_store_uri, model_registry_uri=model_registry_uri)
# Use this command with Databricks Runtime 10.2 ML or above
fs.create_table(
    feature_table_name,
    primary_keys="user_id",
    df=features_df,
    description="Sample feature table",
)
# To run this notebook with Databricks Runtime 10.1 ML or below, uncomment and run this cell
 
#fs.create_feature_table(
#    feature_table_name,
#    "user_id",
#    features_df=features_df,
#    description="Sample feature table",
#)

You should be able to see the new feature table in the feature store workspace.

Read from a remote feature table to train a model

import mlflow
 
class SampleModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input):
        return model_input.sum(axis=1, skipna=False)
record_table_schema = StructType(
    [
        StructField("id", IntegerType(), False),
        StructField("income", IntegerType(), False),
    ]
)
 
record_table = spark.createDataFrame(
    [
        (123, 10000),
        (456, 20000),
        (789, 30000),
    ],
    record_table_schema,
)
from databricks.feature_store import FeatureLookup
 
feature_lookups = [
    FeatureLookup(
        table_name=feature_table_name,
        feature_name="user_feature",
        lookup_key="id",
    ),
]
training_set = fs.create_training_set(
    record_table,
    feature_lookups=feature_lookups,
    exclude_columns=["id"],
    label="income",
)
 
# Load the TrainingSet. load_df() returns a dataframe that can be passed into scikit-learn to train a model
training_df = training_set.load_df()

Register the model with a remote Model Registry

In this step, you register the model in a remote Model Registry (Workspace C).

with mlflow.start_run() as new_run:
  fs.log_model(
      SampleModel(),
      artifact_path="model",
      flavor=mlflow.pyfunc,
      training_set=training_set,
      registered_model_name="multi_workspace_fs_model",
  )

At this point, you should be able to see the new model version in the remote model registry workspace.

Use model in remote Model Registry for batch inference

# Get the model URI
model_uri = f"models:/multi_workspace_fs_model/1"
 
# Call score_batch to get the predictions from the model
with_predictions = fs.score_batch(model_uri, record_table.drop("income"))