Publish features to an online feature store

This article describes how to publish features to an online feature store for real-time serving.

Databricks Feature Store supports these online stores:

Online store provider

Publish

Feature lookup in classic model serving

Feature lookup in Serverless Real-Time Inference

Amazon DynamoDB (Feature Store client v0.3.8 and above)

X

X

X

Amazon Aurora (MySQL-compatible)

X

X

Amazon RDS MySQL

X

X

Note

The DynamoDB online store uses a different schema than the offline store. Specifically, in the online store, primary keys are stored as a combined key in the column _feature_store_internal__primary_keys.

To ensure that Feature Store can access the DynamoDB online store, you must create the table in the online store by using publish_table(). publish_table() validates the schema of the online store and creates the required column. If you try to write to a table in the online store that was not created with publish_table(), the schema might be incompatible and the write command will fail.

Publish batch-computed features to an online store

You can create and schedule a Databricks job to regularly publish updated features. This job can also include the code to calculate the updated features, or you can create and run separate jobs to calculate and publish feature updates.

For SQL stores, the following code assumes that an online database named “recommender_system” already exists in the online store and matches the name of the offline store. If there is no table named “customer_features” in the database, this code creates one. It also assumes that features are computed each day and stored as a partitioned column _dt.

The following code assumes that you have created secrets to access this online store.

DynamoDB support requires v0.3.8 and above.

import datetime
from databricks.feature_store.online_store_spec import AmazonDynamoDBSpec

# do not pass `write_secret_prefix` if you intend to use the instance profile attached to the cluster.
online_store = AmazonDynamoDBSpec(
  region='<region>',
  read_secret_prefix='<read_scope>/<prefix>',
  write_secret_prefix='<write_scope>/<prefix>'
)

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  filter_condition=f"_dt = '{str(datetime.date.today())}'",
  mode='merge'
)
import datetime
from databricks.feature_store.online_store_spec import AmazonRdsMySqlSpec

online_store = AmazonRdsMySqlSpec(
  hostname='<hostname>',
  port='<port>',
  read_secret_prefix='<read_scope>/<prefix>',
  write_secret_prefix='<write_scope>/<prefix>'
)

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  filter_condition=f"_dt = '{str(datetime.date.today())}'",
  mode='merge'
)

Publish streaming features to an online store

To continuously stream features to the online store, set streaming=True.

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  streaming=True
)

Publish selected features to an online store

To publish only selected features to the online store, use the features argument to specify the feature name(s) to publish. Primary keys and timestamp keys are always published. If you do not specify the features argument or if the value is None, all features from the offline feature table are published.

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  features=["total_purchases_30d"]
)

Publish a feature table to a specific database

In the online store spec, specify the database name (database_name) and the table name (table_name). If you do not specify these parameters, the offline database name and feature table name are used. database_name must already exist in the online store.

online_store = AmazonRdsMySqlSpec(
  hostname='<hostname>',
  port='<port>',
  database_name='<database_name>',
  table_name='<table_name>',
  read_secret_prefix='<read_scope>/<prefix>',
  write_secret_prefix='<write_scope>/<prefix>'
)

Overwrite an existing online feature table or specific rows

Use mode='overwrite' in the publish_table call. The online table is completely overwritten by the data in the offline table.

Note

Amazon DynamoDB does not support overwrite mode.

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  mode='overwrite'
)

To overwrite only certain rows, use the filter_condition argument:

fs.publish_table(
  name='recommender_system.customer_features',
  online_store=online_store,
  filter_condition=f"_dt = '{str(datetime.date.today())}'",
  mode='merge'
)