Work with online stores

Databricks Feature Store supports publishing features to online feature stores for real-time serving and automated feature lookup.

Databricks Feature Store supports these online stores:

Online store provider

Publish

Feature lookup in model serving

Amazon DynamoDB (Databricks Runtime 10.4 ML and above)

X

X

Amazon Aurora (MySQL-compatible)

X

X

Amazon RDS MySQL

X

X

Publish features to an online feature store

To publish features to an online feature store for real-time serving, use publish_table.

Authentication for publishing feature table to online stores

To publish feature tables to an online store, you must provide write authentication to the online store.

Databricks recommends that you provide write authentication through an instance profile attached to a Databricks cluster. Alternatively, you can store credentials in Databricks secrets, and then refer to them in a write_secret_prefix when publishing.

Provide write authentication through an instance profile attached to a Databricks cluster

Note

Use these steps only for write authentication needed for publishing to DynamoDB online stores.

  1. Create an instance profile that has write permission to the online store.

  2. Attach the instance profile to a Databricks cluster by following these two steps in Secure access to S3 buckets using instance profiles:

    1. Add the instance profile to Databricks

    2. Launch a cluster with the instance profile.

  3. Select the cluster with the attached instance profile to run the code to publish to the online store. You do not need to provide explicit secret credentials or write_secret_prefix to the online store spec.

Provide write credentials using Databricks secrets

See Provide online store credentials using Databricks secrets.

Authentication for looking up features from online stores with served MLflow models

To enable Databricks-hosted MLflow models to connect to online stores and look up feature values, you must provide read authentication.

These credentials must be kept in Databricks secrets, and you must pass a read_secret_prefix when publishing.

Provide online store credentials using Databricks secrets

  1. Create two secret scopes that contain credentials for the online store: one for read-only access (shown here as <read_scope>) and one for read-write access (shown here as <write_scope>). Alternatively, you can reuse existing secret scopes.

    If you intend to use an instance profile for write authentication, you only need to create the <read_scope>.

  2. Pick a unique name for the target online store, shown here as <prefix>.

    For DynamoDB (requires Databricks Runtime 10.4 LTS ML or above), create the following secrets:

    • Access Key Id for the IAM user with read-only access to the target online store: databricks secrets put --scope <read_scope> --key <prefix>-access-key-id

    • Secret access key for the IAM user with read-only access to the target online store: databricks secrets put --scope <read_scope> --key <prefix>-secret-access-key

    • Access Key Id for the IAM user with read-write access to the target online store: databricks secrets put --scope <write_scope> --key <prefix>-access-key-id

    • Secret access key for the IAM user with read-write access to the target online store: databricks secrets put --scope <write_scope> --key <prefix>-secret-access-key

    For SQL stores, create the following secrets:

    • User with read-only access to the target online store: databricks secrets put --scope <read_scope> --key <prefix>-user

    • Password for user with read-only access to the target online store: databricks secrets put --scope <read_scope> --key <prefix>-password

    • User with read-write access to the target online store: databricks secrets put --scope <write_scope> --key <prefix>-user

    • Password for user with read-write access to the target online store: databricks secrets put --scope <write_scope> --key <prefix>-password

Note

There is a limit on the number of secret scopes per workspace. To avoid hitting this limit, you can define and share a single secret scope for accessing all online stores.

Publish batch-computed features to an online store

You can create and schedule a Databricks job to regularly publish updated features. This job can also include the code to calculate the updated features, or you can create and run separate jobs to calculate and publish feature updates.

For SQL stores, the following code assumes that an online database named “recommender_system” already exists in the online store and matches the name of the offline store. If there is no table named “customer_features” in the database, this code creates one. It also assumes that features are computed each day and stored as a partitioned column _dt.

The following code assumes that you have created secrets to access this online store.

DynamoDB support requires Databricks Runtime 10.4 LTS ML or above.

import datetime
from databricks.feature_store.online_store_spec import AmazonDynamoDBSpec

# do not pass `write_secret_prefix` if you intend to use the instance profile attached to the cluster.
online_store = AmazonDynamoDBSpec(
  region='<region>',
  read_secret_prefix='<read_scope>/<prefix>',
  write_secret_prefix='<write_scope>/<prefix>'
)

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  filter_condition=f"_dt = '{str(datetime.date.today())}'",
  mode='merge'
)
import datetime
from databricks.feature_store.online_store_spec import AmazonRdsMySqlSpec

online_store = AmazonRdsMySqlSpec(
  hostname='<hostname>',
  port='<port>',
  read_secret_prefix='<read_scope>/<prefix>',
  write_secret_prefix='<write_scope>/<prefix>'
)

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  filter_condition=f"_dt = '{str(datetime.date.today())}'",
  mode='merge'
)

Publish streaming features to an online store

To continuously stream features to the online store, set streaming=True.

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  streaming=True
)

Publish selected features to an online store

To publish only selected features to the online store, use the features argument to specify the feature name(s) to publish. Primary keys and timestamp keys are always published. If you do not specify the features argument or if the value is None, all features from the offine feature table are published.

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  features=["total_purchases_30d"]
)

Publish a feature table to a specific database

In the online store spec, specify the database name (database_name) and the table name (table_name). If you do not specify these parameters, the offline database name and feature table name are used. database_name must already exist in the online store.

online_store = AmazonRdsMySqlSpec(
  hostname='<hostname>',
  port='<port>',
  database_name='<database_name>',
  table_name='<table_name>',
  read_secret_prefix='<read_scope>/<prefix>',
  write_secret_prefix='<write_scope>/<prefix>'
)

Overwrite an existing online feature table or specific rows

Use mode='overwrite' in the publish_table call. The online table is completely overwritten by the data in the offline table.

Note

Amazon DynamoDB does not support overwrite mode.

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  mode='overwrite'
)

To overwrite only certain rows, use the filter_condition argument:

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  filter_condition=f"_dt = '{str(datetime.date.today())}'",
  mode='merge'
)

Enable Databricks hosted MLflow models to look up features from online stores

Preview

This feature is in Public Preview.

MLflow Model Serving on Databricks can automatically look up feature values from published online stores.

Requirements

Note

You can publish the feature table at any time prior to model deployment, including after model training.

Automatic feature lookup

Databricks Model Serving supports automatic feature lookup from these online stores:

  • Amazon DynamoDB (Databricks Runtime 10.4 LTS ML and above)

  • Amazon Aurora (MySQL-compatible)

  • Amazon RDS MySQL

Automatic feature lookup is supported for the following data types:

  • IntegerType

  • FloatType

  • StringType

  • DoubleType

  • LongType

  • TimestampType

  • DateType

  • ShortType

  • DecimalType

  • ArrayType

  • MapType

Override feature values in online model scoring

All features required by the model (logged with FeatureStoreClient.log_model) are automatically looked up from online stores for model scoring. To override feature values when scoring a model using a REST API, include the feature values as a part of the API payload.

Note

The new feature values must conform to the feature’s data type as expected by the underlying model.