Databricks Online Feature Stores
This feature is in beta and is available in the following regions:
us-east-1
, us-west-2
, eu-west-1
, ap-southeast-1
, ap-southeast-2
, eu-central-1
, us-east-2
, ap-south-1
Databricks Online Feature Stores are a high-performance, scalable solution for serving feature data to online applications and real-time machine learning models. Powered by Databricks Lakebase, it provides low-latency access to feature data at a high scale while maintaining consistency with your offline feature tables.
The primary use cases for Online Feature Stores include:
- Serving features to real-time applications like recommendation systems, fraud detection, and personalization engines using Feature Serving Endpoints.
- Automatic feature lookup for real-time inference in model serving endpoints.
Requirements
Databricks Online Feature Stores requires Databricks Runtime 16.4 LTS ML or above. You can also use serverless compute.
To use Databricks Online Feature Stores, you must first install the package. The following lines of code must be executed each time a notebook is run:
%pip install --pre databricks-feature-engineering>=0.13.0a4
dbutils.library.restartPython()
Create an online store
To create a new online feature store:
from databricks.feature_engineering import FeatureEngineeringClient
# Initialize the client
fe = FeatureEngineeringClient()
# Create an online store with specified capacity
fe.create_online_store(
name="my-online-store",
capacity="CU_2" # Valid options: "CU_1", "CU_2", "CU_4", "CU_8"
)
The capacity options correspond to different performance tiers "CU_1", "CU_2", "CU_4", and "CU_8". Each capacity unit allocates about 16GB of RAM to the database instance, along with all associated CPU and local SSD resources. Scaling up increases these resources linearly. For more details see Manage instance capacity.
Manage online stores
The following code shows how to retrieve and update online stores:
# Get information about an existing online store
store = fe.get_online_store(name="my-online-store")
if store:
print(f"Store: {store.name}, State: {store.state}, Capacity: {store.capacity}")
# Update the capacity of an online store
updated_store = fe.update_online_store(
name="my-online-store",
capacity="CU_4" # Upgrade to higher capacity
)
Add read replicas to an online store
When creating or updating an online store, you can add read replicas to the online store by specifying the read_replica_count
parameter. Read traffic is automatically distributed across read replicas, reducing latency and improving performance and scalability for high-concurrency workloads.
Publish a feature table to an online store
After your online store is in the AVAILABLE state, you can publish feature tables to make them available for low-latency access. Review the table below to ensure that your source offline table was created correctly for the real-time use case.
Use case | Create the offline feature table using this method |
---|---|
Only the latest feature values for each entity ID are available in the online store for real-time applications. Multiple rows with the same primary key value but different time series key values may exist in the offline data source, and will be deduplicated in the publish pipeline. This case is most frequently used for online model or feature serving endpoints. | |
The latest and all previous time series feature values from offline table are available in the online store for access by real-time applications. All rows from the source (offline) table are published without deduplication. This is infrequently used but may be required where endpoints query features by entity ID and exact date/timestamp for data verification or back-testing. |
Prerequisites for publishing to online stores
All feature tables (with or without time series) must meet these requirements before publishing:
- Primary key constraint: Required for online store publishing
- Non-nullable primary keys: Primary key columns cannot contain NULL values
- Change Data Feed enabled: Required for online store sync. See Enable change data feed
-- Enable CDF if not already enabled
ALTER TABLE catalog.schema.your_feature_table
SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');
-- Ensure primary key columns are not nullable
ALTER TABLE catalog.schema.your_feature_table
ALTER COLUMN user_id SET NOT NULL;
To publish a feature table to an online store:
from databricks.ml_features.entities.online_store import DatabricksOnlineStore
# Get the online store instance
online_store = fe.get_online_store(name="my-online-store")
# Publish the feature table to the online store
fe.publish_table(
online_store=online_store,
source_table_name="catalog_name.schema_name.feature_table_name",
online_table_name="catalog_name.schema_name.online_feature_table_name"
)
The publish_table
operation does the following:
- Create a table in the online store if it doesn't exist.
- Sync the feature data from the offline feature table to the online store.
- Set up the necessary infrastructure for keeping the online store in sync with the offline table.
Continuously update online features
If publish_table
is called with streaming=True
, the online table is set up with a streaming pipeline to continuously update the online store as new data arrives in the offline feature table.
Schedule updates to online features
To periodically update features in an online table, create a scheduled Lakeflow Job that runs publish_table
. The job automatically refreshes the table and incrementally updates the online features. See Lakeflow Jobs.
Explore and query online features
After your published table status shows as "AVAILABLE", you can explore and query the feature data in several ways:
Unity Catalog UI: Navigate to the online table in Unity Catalog to view sample data and explore the schema directly in the UI. This provides a convenient way to inspect your feature data and verify that the publishing process completed successfully.
SQL Editor: For more advanced querying and data exploration, you can use the SQL editor to run PostgreSQL queries against your online feature tables. This allows you to perform complex queries, joins, and analysis on your feature data. For detailed instructions on using the SQL editor with online stores, see Access a database instance from the SQL editor.
Use online features in real-time applications
To serve features to real-time applications and services, create a feature serving endpoint. See Feature Serving endpoints.
Models that are trained using features from Databricks automatically track lineage to the features they were trained on. When deployed as endpoints, these models use Unity Catalog to find appropriate features in online stores. For details, see Use features in online workflows.
Delete an online store
To delete an online store:
fe.delete_online_store(name="my-online-store")
Deleting an online published table can lead to unexpected failures in downstream dependencies. Before you delete a table, you should ensure that its online features are no longer used by model serving or feature serving endpoints.
Limitations
- The maximum number of read replicas for a Databricks online feature store is 2. Reach out to your Databricks account team to increase the limit.
- The following parameters are not supported when publishing to a Databricks online feature store:
filter_condition
,checkpoint_location
,mode
,trigger
, andfeatures
. - Only feature tables in Unity Catalog are supported.
- The only supported publish mode is "merge".
Example notebook
The following notebook shows an example of how to set up and access a Databricks Online Feature Store using Databricks Lakebase.
Online feature store with Lakebase notebook
Additional resources
- Learn more about Feature Engineering in Databricks.
- Explore data governances and lineage in Unity Catalog.
- Understand Lakebase architecture and capabilities.