Databricks Online Feature Stores
This feature is in Beta and is available in the following regions:
us-east-1
, us-west-2
, eu-west-1
, ap-southeast-1
, ap-southeast-2
, eu-central-1
, us-east-2
, ap-south-1
Databricks Online Feature Stores are a high-performance, scalable solution for serving feature data to online applications and real-time machine learning models. Powered by Databricks Lakebase, it provides low-latency access to feature data at a high scale while maintaining consistency with your offline feature tables.
The primary use cases for the Online Feature Store include:
- Serving features to real-time applications like recommendation systems, fraud detection, and personalization engines using Feature Serving Endpoints.
- Automatic feature lookup for real-time inference in model serving endpoints.
Requirements
Databricks Online Feature Stores requires Databricks Runtime 16.4 LTS ML or above. You can also use serverless compute.
To use Databricks Online Feature Stores, you must first install the package. The following lines of code must be executed each time a notebook is run:
%pip install databricks-feature-engineering==0.13.0a3
dbutils.library.restartPython()
Create an online store
To create a new online feature store:
from databricks.feature_engineering import FeatureEngineeringClient
# Initialize the client
fe = FeatureEngineeringClient()
# Create an online store with specified capacity
fe.create_online_store(
name="my-online-store",
capacity="CU_2" # Valid options: "CU_1", "CU_2", "CU_4", "CU_8"
)
The capacity options correspond to different performance tiers "CU_1", "CU_2", "CU_4", and "CU_8". Each capacity unit allocates about 16GB of RAM to the database instance, along with all associated CPU and local SSD resources. Scaling up increases these resources linearly. For more details see Manage instance capacity.
Manage online stores
The following code shows how to retrieve and update online stores:
# Get information about an existing online store
store = fe.get_online_store(name="my-online-store")
if store:
print(f"Store: {store.name}, State: {store.state}, Capacity: {store.capacity}")
# Update the capacity of an online store
updated_store = fe.update_online_store(
name="my-online-store",
capacity="CU_4" # Upgrade to higher capacity
)
Publish a feature table to an online store
After your online store is in the AVAILABLE state, you can publish feature tables to make them available for low-latency access.
Change data feed must be enabled on the table before it can be published to an online store.
To publish a feature table to an online store:
from databricks.ml_features.entities.online_store import DatabricksOnlineStore
# Get the online store instance
online_store = fe.get_online_store(name="my-online-store")
# Publish the feature table to the online store
fe.publish_table(
online_store=online_store,
source_table_name="catalog_name.schema_name.feature_table_name",
online_table_name="catalog_name.schema_name.online_feature_table_name"
)
The publish_table
operation does the following:
- Create a table in the online store if it doesn't exist.
- Sync the feature data from the offline feature table to the online store.
- Set up the necessary infrastructure for keeping the online store in sync with the offline table.
Continuously update online features
If publish_table
is called with streaming=True
, the online table is set up with a streaming pipeline to continuously update the online store as new data arrives in the offline feature table.
Schedule updates to online features
To periodically update features in an online table, create a scheduled Lakeflow Job that runs publish_table
. The job automatically refreshes the table and incrementally updates the online features. See Lakeflow Jobs.
Explore and query online features
After your published table status shows as "AVAILABLE", you can explore and query the feature data in several ways:
Unity Catalog UI: Navigate to the online table in Unity Catalog to view sample data and explore the schema directly in the UI. This provides a convenient way to inspect your feature data and verify that the publishing process completed successfully.
SQL Editor: For more advanced querying and data exploration, you can use the SQL editor to run PostgreSQL queries against your online feature tables. This allows you to perform complex queries, joins, and analysis on your feature data. For detailed instructions on using the SQL editor with online stores, see Access a database instance from the SQL editor.
Use online features in real-time applications
To serve features to real-time applications and services, create a feature serving endpoint. See Feature Serving endpoints.
Models that are trained using features from Databricks automatically track lineage to the features they were trained on. When deployed as endpoints, these models use Unity Catalog to find appropriate features in online stores. For details, see Use features in online workflows.
Delete an online store
To delete an online store:
fe.delete_online_store(name="my-online-store")
Deleting an online published table can lead to unexpected failures in downstream dependencies. Before you delete a table, you should ensure that its online features are no longer used by model serving or feature serving endpoints.
Limitations
- The following parameters are not supported when publishing to a Databricks online feature store:
filter_condition
,checkpoint_location
,mode
,trigger
, andfeatures
. - Only feature tables in Unity Catalog are supported.
- The only supported publish mode is "merge".
Additional resources
- Learn more about Feature Engineering in Databricks.
- Explore data governances and lineage in Unity Catalog.
- Understand Lakebase architecture and capabilities.