Databricks Online Feature Stores

Databricks Online Feature Stores are a high-performance, scalable solution for serving feature data to online applications and real-time machine learning models. Powered by Databricks Lakebase, Online Feature Stores provide low-latency access to feature data at a high scale while maintaining consistency with your offline feature tables.

The primary use cases for Online Feature Stores include:

Serving features to real-time applications like recommendation systems, fraud detection, and personalization engines using Feature Serving Endpoints.
Automatic feature lookup for real-time inference in model serving endpoints.

note

Online Feature Stores support Lakebase Provisioned instances only. For region availability and capacity options, see Lakebase Provisioned instances.

Requirements

Databricks Online Feature Stores requires Databricks Runtime 16.4 LTS ML or above. You can also use serverless compute.

To use Databricks Online Feature Stores, you must first install the package. The following lines of code must be executed each time a notebook is run:

Python
%pip install databricks-feature-engineering>=0.13.0
dbutils.library.restartPython()

Create an online store

note

The create_online_store API only supports creation of Lakebase Provisioned database instances.

When you create an online store, you provision a highly available managed infrastructure for real-time feature serving. The create_online_store API creates a Lakebase Provisioned instance with a specified capacity. To manage costs, delete online stores when not in use for development and testing.

To create a new online feature store:

Python
from databricks.feature_engineering import FeatureEngineeringClient

# Initialize the client
fe = FeatureEngineeringClient()

# Create an online store with specified capacity
fe.create_online_store(
    name="my-online-store",
    capacity="CU_2"  # Valid options: "CU_1", "CU_2", "CU_4", "CU_8"
)

The capacity options correspond to different provisioned instance performance tiers: "CU_1", "CU_2", "CU_4", and "CU_8". Each capacity unit allocates about 16GB of RAM to the provisioned database instance, along with all associated CPU and local SSD resources. Scaling up increases these resources linearly. For more details about Lakebase Provisioned instances, see Manage instance capacity.

Manage online stores

The following code shows how to retrieve and update online stores:

Python
# List all accessible online stores
stores = fe.list_online_stores()
for store in stores:
    print(f"Store: {store.name}, State: {store.state}, Capacity: {store.capacity}")

# Get information about an existing online store
store = fe.get_online_store(name="my-online-store")
if store:
    print(f"Store: {store.name}, State: {store.state}, Capacity: {store.capacity}")

# Update the capacity of an online store
updated_store = fe.update_online_store(
    name="my-online-store",
    capacity="CU_4"  # Upgrade to higher capacity
)

note

Only Lakebase Provisioned instances are supported in all online store APIs. Use create_online_store API to provision appropriate online store.

Add read replicas to an online store

When creating or updating an online store, you can add read replicas to the online store by specifying the read_replica_count parameter. Read traffic is automatically distributed across read replicas, reducing latency and improving performance and scalability for high-concurrency workloads.

Publish a feature table to an online store

note

The publish_table API only supports publishing features to Lakebase Provisioned database instances.

After your online store is in the AVAILABLE state, you can publish feature tables to make them available for low-latency access. The publish_table API synchronizes data from your offline feature table to online store created using create_online_store API. Review the table below to ensure that your source offline table was created correctly for the real-time use case.

Use case	Create the offline feature table using this method
Only the latest feature values for each entity ID are available in the online store for real-time applications. Multiple rows with the same primary key value but different time series key values may exist in the offline data source, and will be deduplicated in the publish pipeline. This case is most frequently used for online model or feature serving endpoints.	Create table with time series designation
The latest and all previous time series feature values from offline table are available in the online store for access by real-time applications. All rows from the source (offline) table are published without deduplication. This is infrequently used but may be required where endpoints query features by entity ID and exact date/timestamp for data verification or back-testing.	Create table without time series designation

Use case

Create the offline feature table using this method

Only the latest feature values for each entity ID are available in the online store for real-time applications. Multiple rows with the same primary key value but different time series key values may exist in the offline data source, and will be deduplicated in the publish pipeline.

This case is most frequently used for online model or feature serving endpoints.

Create table with time series designation

The latest and all previous time series feature values from offline table are available in the online store for access by real-time applications.

All rows from the source (offline) table are published without deduplication. This is infrequently used but may be required where endpoints query features by entity ID and exact date/timestamp for data verification or back-testing.

Create table without time series designation

Prerequisites for publishing to online stores

All feature tables (with or without time series) must meet these requirements before publishing:

Primary key constraint: Required for online store publishing
Non-nullable primary keys: Primary key columns cannot contain NULL values
Change Data Feed enabled: Required for the CONTINUOUS and TRIGGERED publish modes. See Enable change data feed for how to enable Delta Table Change Data Feed, and Publish modes for a discussion of publish modes.

SQL
-- Enable CDF if not already enabled
ALTER TABLE catalog.schema.your_feature_table
SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');

-- Ensure primary key columns are not nullable
ALTER TABLE catalog.schema.your_feature_table
ALTER COLUMN user_id SET NOT NULL;

Publish a feature table

To publish a feature table to an online store:

Python
from databricks.ml_features.entities.online_store import DatabricksOnlineStore

# Get the online store instance
online_store = fe.get_online_store(name="my-online-store")

# Publish the feature table to the online store
fe.publish_table(
    online_store=online_store,
    source_table_name="catalog_name.schema_name.feature_table_name",
    online_table_name="catalog_name.schema_name.online_feature_table_name",
    # `publish_mode` argument is optional and defaults to "TRIGGERED" mode if not specified
)

The publish_table operation does the following:

Create a table in the online store if it doesn't exist.
Sync the feature data from the offline feature table to the online store.
Set up the necessary infrastructure for keeping the online store in sync with the offline table.

Publish modes

The publish_mode parameter for Databricks Online Feature Store that determines how and when the online table is updated with changes from the offline feature table. See Sync modes explained for full details on the supported modes, summarized below:

Mode	Description
`TRIGGERED`	Default. Incrementally updates the online table with changes from the offline table using the API or on a schedule. Options to trigger the data sync periodically: Create a notebook that runs `publish_table`. Create a scheduled Lakeflow Job that runs this notebook to incrementally update the online features. See Notebook task for jobs. Schedule updates to the pipeline with id from the returned object of `publish_table`. See Run a pipeline update. This mode requires Change Data Feed to be enabled on the offline table. See Prerequisites for publishing to online stores.
`CONTINUOUS`	The online table is set up with a streaming pipeline to immediately update the online store as new data written to the offline feature table.
`SNAPSHOT`	Performs a one-time sync that copies all data from the source table to the online store. This mode is efficient when there are large number of updates on existing rows between two sync operations.

Mode

Description

TRIGGERED

Default. Incrementally updates the online table with changes from the offline table using the API or on a schedule. Options to trigger the data sync periodically:

Create a notebook that runs publish_table. Create a scheduled Lakeflow Job that runs this notebook to incrementally update the online features. See Notebook task for jobs.
Schedule updates to the pipeline with id from the returned object of publish_table. See Run a pipeline update.

This mode requires Change Data Feed to be enabled on the offline table. See Prerequisites for publishing to online stores.

CONTINUOUS

The online table is set up with a streaming pipeline to immediately update the online store as new data written to the offline feature table.

SNAPSHOT

Performs a one-time sync that copies all data from the source table to the online store. This mode is efficient when there are large number of updates on existing rows between two sync operations.

The publish_mode parameter replaces the streaming parameter starting from v0.13.0.1 and prior versions. For backward compatibility, if streaming=True is passed, it is equivalent to setting publish_mode="CONTINUOUS".

Delete an online table

To delete an online table, use the Databricks SDK:

Python
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
w.feature_store.delete_online_table(online_table_name="catalog_name.schema_name.online_feature_table_name")

important

This is the only recommended method for deleting an online table. It removes the the table from both Unity Catalog and the database. Other methods such as the Databricks SQL command DROP TABLE or the Python SDK command to delete a synced table do not delete the table from underlying database storage.

Explore and query online features

After your published table status shows as "AVAILABLE", you can explore and query the feature data in several ways:

Unity Catalog UI: Navigate to the online table in Unity Catalog to view sample data and explore the schema directly in the UI. This provides a convenient way to inspect your feature data and verify that the publishing process completed successfully.

SQL Editor: For more advanced querying and data exploration, you can use the SQL editor to run PostgreSQL queries against your online feature tables. This allows you to perform complex queries, joins, and analysis on your feature data. For detailed instructions on using the SQL editor with online stores, see Access a database instance from the SQL editor.

Use online features in real-time applications

To serve features to real-time applications and services, create a feature serving endpoint. See Feature Serving endpoints.

Models that are trained using features from Databricks automatically track lineage to the features they were trained on. When deployed as endpoints, these models use Unity Catalog to find appropriate features in online stores. For details, see Use features in online workflows.

Delete an online store

To delete an online store:

Python
fe.delete_online_store(name="my-online-store")

note

Deleting an online published table can lead to unexpected failures in downstream dependencies. Before you delete a table, you should ensure that its online features are no longer used by model serving or feature serving endpoints.

Cost optimization best practices

Reuse online stores: You can publish multiple feature tables to a single online store. For development, testing, and training scenarios, we recommend sharing one online store across multiple projects or users rather than creating separate stores.
Right-size capacity: Start with CU_1 for testing and only scale up when performance requirements demand it.
Delete online stores that are not in use: Online stores continuously incur costs. Delete online stores that are no longer needed.

Limitations

Specifying a specific online table is not supported. When a feature table is published to multiple online tables, model serving and feature serving endpoints always resolve to the oldest online table based on the creation timestamp.
The maximum number of read replicas for a Databricks online feature store is 2. Reach out to your Databricks account team to increase the limit.
The following parameters are not supported when publishing to a Databricks online feature store: filter_condition, checkpoint_location, mode, trigger, and features.
Only feature tables in Unity Catalog are supported.
The only supported publish mode is "merge".
Scaling to zero is not supported for online stores.

Example notebook

The following notebook shows an example of how to set up and access a Databricks Online Feature Store using Databricks Lakebase.

Online feature store with Lakebase notebook

Open notebook in new tab

Additional resources

Learn more about Feature Engineering in Databricks.
Explore data governances and lineage in Unity Catalog.

Understand Lakebase architecture and capabilities.

Requirements​

Create an online store​

Manage online stores​

Add read replicas to an online store​

Publish a feature table to an online store​

Prerequisites for publishing to online stores​

Publish a feature table​

Publish modes​

Delete an online table​

Explore and query online features​

Use online features in real-time applications​

Delete an online store​

Cost optimization best practices​

Limitations​

Example notebook​

Online feature store with Lakebase notebook

Additional resources​

Requirements

Create an online store

Manage online stores

Add read replicas to an online store

Publish a feature table to an online store

Prerequisites for publishing to online stores

Publish a feature table

Publish modes

Delete an online table

Explore and query online features

Use online features in real-time applications

Delete an online store

Cost optimization best practices

Limitations

Example notebook

Additional resources