Databricks Feature Store

This page is an overview of capabilities available when you use Databricks Feature Store with Unity Catalog.

The Databricks Feature Store provides a central registry for features used in your AI and ML models. Feature tables and models are registered in Unity Catalog, providing built-in governance, lineage, and cross-workspace feature sharing and discovery. With Databricks, the entire model training workflow takes place on a single platform, including:

Data pipelines that ingest raw data, create feature tables, train models, and perform batch inference.
Model and feature serving endpoints that are available with a single click and that provide milliseconds of latency.
Data and model monitoring.

When you use features from the feature store to train models, the model automatically tracks lineage to the features that were used in training. At inference time, the model automatically looks up the latest feature values. The feature store also provides on-demand computation of features for real-time applications. The feature store handles all of the feature computation tasks. This eliminates training/serving skew, ensuring that the feature computations used at inference are the same as those used during model training. It also significantly simplifies the client-side code, as all feature lookups and computation are handled by the feature store.

note

This page covers feature engineering and serving capabilities for workspaces that are enabled for Unity Catalog. If your workspace is not enabled for Unity Catalog, see Workspace Feature Store (legacy).

Conceptual overview

For an overview of how Databricks Feature Store works and a glossary of terms, see Feature store overview and glossary.

Feature engineering

Feature	Description
Work with feature tables in Unity Catalog	Create and work with feature tables.

Feature	Description
Explore features in Unity Catalog	Explore and manage feature tables using Catalog Explorer and the Features UI.
Use tags with feature tables and features in Unity Catalog	Use simple key-value pairs to categorize and manage your feature tables and features.

Use features in training workflows

Feature	Description
Use features to train models	Use features to train models.
Point-in-time feature joins	Use point-in-time correctness to create a training dataset that reflects feature values as of the time a label observation was recorded.
Python API	Python API reference

Serve features

Feature	Description
Databricks Online Feature Stores	Serve feature data to online applications and real-time machine learning models. Powered by Databricks Lakebase.
Model Serving with automatic feature lookup	Automatically look up feature values from an online store.
Feature Serving endpoints	Serve features to models and applications outside of Databricks.
On-demand feature computation	Calculate feature values at the time of inference.

Feature governance and lineage

Feature	Description
Feature governance and lineage	Use Unity Catalog to control access to feature tables and view the lineage of a feature table, model, or function.

Tutorials

Tutorial	Description
Example notebooks to get started	Basic notebook. Shows how to create a feature table, use it to train a model, and run batch scoring using automatic feature lookup. Also shows the Feature Engineering UI to search for features and view lineage. Taxi example notebook. Shows the process of creating features, updating them, and using them for model training and batch inference.
Example: Deploy and query a feature serving endpoint	Tutorial and example notebook showing how to deploy and query a feature serving endpoint.
Example: use features with structured RAG applications	Tutorial showing how to use Databricks online tables and feature serving endpoints for retrieval augmented generation (RAG) applications.

Tutorial

Description

Example notebooks to get started

Basic notebook. Shows how to create a feature table, use it to train a model, and run batch scoring using automatic feature lookup. Also shows the Feature Engineering UI to search for features and view lineage.

Taxi example notebook. Shows the process of creating features, updating them, and using them for model training and batch inference.

Example: Deploy and query a feature serving endpoint

Tutorial and example notebook showing how to deploy and query a feature serving endpoint.

Example: use features with structured RAG applications

Tutorial showing how to use Databricks online tables and feature serving endpoints for retrieval augmented generation (RAG) applications.

Requirements

Your workspace must be enabled for Unity Catalog.
Feature engineering in Unity Catalog requires Databricks Runtime 13.3 LTS or above.

If your workspace does not meet these requirements, see Workspace Feature Store (legacy) for how to use the legacy Workspace Feature Store.

Supported data types

Feature engineering in Unity Catalog and legacy Workspace Feature Store support the following PySpark data types:

IntegerType
FloatType
BooleanType
StringType
DoubleType
LongType
TimestampType
DateType
ShortType
ArrayType
BinaryType [1]
DecimalType [1]
MapType [1]
StructType [2]

[1] BinaryType, DecimalType, and MapType are supported in all versions of Feature Engineering in Unity Catalog and in Workspace Feature Store v0.3.5 or above. [2] StructType is supported in Feature Engineering v0.6.0 or above.

The data types listed above support feature types that are common in machine learning applications. For example:

You can store dense vectors, tensors, and embeddings as ArrayType.
You can store sparse vectors, tensors, and embeddings as MapType.
You can store text as StringType.

When published to online stores, ArrayType and MapType features are stored in JSON format.

The Feature Store UI displays metadata on feature data types:

Complex data types example

More information

For more information on best practices, download The Comprehensive Guide to Feature Stores.

Conceptual overview​

Feature engineering​

Discover and share features​

Use features in training workflows​

Serve features​

Feature governance and lineage​

Tutorials​

Requirements​

Supported data types​

More information​