Skip to main content

Reverse ETL with Lakebase

Beta

Lakebase Postgres (Autoscaling Beta) is the next version of Lakebase, available for evaluation only. For production workloads, use Lakebase Public Preview. See choosing between versions to understand which version is right for you.

Reverse ETL in Lakebase syncs Unity Catalog tables into Postgres so applications can use curated lakehouse data directly. The lakehouse is optimized for analytics and enrichment, while Lakebase is designed for operational workloads that require fast queries and transactional consistency.

Architecture diagram showing data flow from lakehouse to Lakebase to applications

What is reverse ETL?

Reverse ETL lets you move analytics-grade data from Unity Catalog into Lakebase Postgres, where you can make it available to applications that need low-latency queries (sub-10ms) and full ACID transactions. It bridges the gap between analytical storage and operational systems by keeping curated data usable in real-time applications.

How it works

Databricks synced tables create a managed copy of your Unity Catalog data in Lakebase. When you create a synced table, you get:

  1. A new Unity Catalog table (read-only, managed by the sync pipeline)
  2. A Postgres table in Lakebase (queryable by your applications)

Diagram showing the three-table relationship in reverse ETL

For example, you can sync gold tables, engineered features, or ML outputs from analytics.gold.user_profiles into a new synced table analytics.gold.user_profiles_synced. In Postgres, the Unity Catalog schema name becomes the Postgres schema name, so this appears as "gold"."user_profiles_synced":

SQL
SELECT * FROM "gold"."user_profiles_synced" WHERE "user_id" = 12345;

Applications connect with standard Postgres drivers and query the synced data alongside their own operational state.

Sync pipelines use managed Lakeflow Declarative Pipelines to continuously update both the Unity Catalog synced table and the Postgres table with changes from the source table. Each sync can use up to 16 connections to your Lakebase database and supports continuous writes at approximately 1,200 rows per second per Capacity Unit (CU) and bulk writes at up to 15,000 rows per second per CU.

Lakebase Postgres supports up to 10,000 concurrent connections with transactional guarantees, so applications can read enriched data while also handling inserts, updates, and deletes in the same database.

Sync modes

Choose the right sync mode based on your application needs:

Mode

Description

Best for

Performance

Snapshot

One-time copy of all data

Initial setup or historical analysis

10x more efficient if modifying >10% of source data

Triggered

Scheduled updates that run on demand or at intervals

Dashboards, updated hourly/daily

Good cost/lag balance. Expensive if run <5min intervals

Continuous

Real-time streaming with seconds of latency

Live applications (higher cost due to dedicated compute)

Lowest lag, highest cost. Minimum 15-second intervals

Triggered and Continuous modes require Change Data Feed (CDF) to be enabled on your source table. If CDF isn't enabled, you'll see a warning in the UI with the exact ALTER TABLE command to run. For more details on Change Data Feed, see Use Delta Lake change data feed on Databricks.

Example use cases

Reverse ETL with Lakebase supports common operational scenarios:

  • Personalization engines that need fresh user profiles synced into Databricks Apps
  • Applications that serve model predictions or feature values computed in the lakehouse
  • Customer-facing dashboards that display KPIs in real time
  • Fraud detection services that need risk scores available for immediate action
  • Support tools that enrich customer records with curated data from the lakehouse

Create a synced table (UI)

You can create synced tables in the Databricks UI or programmatically with the SDK. The UI workflow is outlined below.

Prerequisites

You need:

  • A Databricks workspace with Lakebase enabled.
  • A Lakebase database project (see Create a database project).
  • A Unity Catalog table with curated data.
  • Permissions to create synced tables.

For capacity planning and data type compatibility, see Data types and compatibility and Capacity planning.

Step 1: Select your source table

Go to Catalog in the workspace sidebar and select the Unity Catalog table you want to sync.

Catalog Explorer showing a selected table

Step 2: Enable Change Data Feed (if needed)

If you plan to use Triggered or Continuous sync modes, your source table needs Change Data Feed enabled. Check if your table already has CDF enabled, or run this command in a SQL editor or notebook:

SQL
ALTER TABLE your_catalog.your_schema.your_table
SET TBLPROPERTIES (delta.enableChangeDataFeed = true)

Replace your_catalog.your_schema.your_table with your actual table name.

Step 3: Create synced table

Click Create > Synced table from the table details view.

Create button dropdown showing Synced table option

Step 4: Configure

In the Create synced table dialog:

  1. Table name: Enter a name for your synced table (it is created in the same catalog and schema as your source table). This creates both a Unity Catalog synced table and a Postgres table you can query.
  2. Database type: Choose Lakebase Serverless (Beta).
  3. Sync mode: Choose Snapshot, Triggered, or Continuous based on your needs (see sync modes above).
  4. Configure your project, branch, and database selections.
  5. Verify the Primary key is correct (usually auto-detected).

If you chose Triggered or Continuous mode and haven't enabled Change Data Feed yet, you'll see a warning with the exact command to run. For data type compatibility questions, see Data types and compatibility.

Click Create to create the synced table.

Step 5: Monitor

After creation, monitor the synced table in Catalog. The Overview tab shows sync status, configuration, pipeline status, and last sync timestamp. Use Sync now for manual refresh.

Data types and compatibility

Unity Catalog data types are mapped to Postgres types when creating synced tables. Complex types (ARRAY, MAP, STRUCT) are stored as JSONB in Postgres.

Source column type

Postgres column type

BIGINT

BIGINT

BINARY

BYTEA

BOOLEAN

BOOLEAN

DATE

DATE

DECIMAL(p,s)

NUMERIC

DOUBLE

DOUBLE PRECISION

FLOAT

REAL

INT

INTEGER

INTERVAL

INTERVAL

SMALLINT

SMALLINT

STRING

TEXT

TIMESTAMP

TIMESTAMP WITH TIME ZONE

TIMESTAMP_NTZ

TIMESTAMP WITHOUT TIME ZONE

TINYINT

SMALLINT

ARRAY<elementType>

JSONB

MAP<keyType,valueType>

JSONB

STRUCT<fieldName:fieldType[, ...]>

JSONB

note

GEOGRAPHY, GEOMETRY, VARIANT, and OBJECT types are not supported.

Handle invalid characters

Certain characters like null bytes (0x00) are allowed in Unity Catalog STRING, ARRAY, MAP, or STRUCT columns but not supported in Postgres TEXT or JSONB columns. This can cause sync failures with errors like:

ERROR: invalid byte sequence for encoding "UTF8": 0x00
ERROR: unsupported Unicode escape sequence DETAIL: \u0000 cannot be converted to text

Solutions:

  • Sanitize string fields: Remove unsupported characters before syncing. For null bytes in STRING columns:

    SQL
    SELECT REPLACE(column_name, CAST(CHAR(0) AS STRING), '') AS cleaned_column FROM your_table
  • Convert to BINARY: For STRING columns where preserving raw bytes is necessary, convert to BINARY type.

Programmatic creation

For automation workflows, you can create synced tables programmatically using the Databricks SDK, CLI, or REST API.

Python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.database import (
SyncedDatabaseTable,
SyncedTableSpec,
NewPipelineSpec,
SyncedTableSchedulingPolicy
)

# Initialize the Workspace client
w = WorkspaceClient()

# Create a synced table
synced_table = w.database.create_synced_database_table(
SyncedDatabaseTable(
name="lakebase_catalog.schema.synced_table", # Full three-part name
spec=SyncedTableSpec(
source_table_full_name="analytics.gold.user_profiles",
primary_key_columns=["user_id"], # Primary key columns
scheduling_policy=SyncedTableSchedulingPolicy.TRIGGERED, # SNAPSHOT, TRIGGERED, or CONTINUOUS
new_pipeline_spec=NewPipelineSpec(
storage_catalog="lakebase_catalog",
storage_schema="staging"
)
),
)
)
print(f"Created synced table: {synced_table.name}")

# Check the status of a synced table
status = w.database.get_synced_database_table(name=synced_table.name)
print(f"Synced table status: {status.data_synchronization_status.detailed_state}")
print(f"Status message: {status.data_synchronization_status.message}")

Capacity planning

When planning your reverse ETL implementation, consider these resource requirements:

  • Connection usage: Each synced table uses up to 16 connections to your Lakebase database, which count toward the instance's connection limit.
  • Size limits: Total logical data size limit across all synced tables is 2 TB. Individual tables don't have limits, but Databricks recommends not exceeding 1 TB for tables requiring refreshes.
  • Naming requirements: Database, schema, and table names may only contain alphanumeric characters and underscores ([A-Za-z0-9_]+).
  • Schema evolution: Only additive schema changes (like adding columns) are supported for Triggered and Continuous modes.

Delete a synced table

To delete a synced table, you must remove it from both Unity Catalog and Postgres:

  1. Delete from Unity Catalog: In Catalog, find your synced table, click the Kebab menu icon. menu, and select Delete. This stops data refreshes but leaves the table in Postgres.

  2. Drop from Postgres: Connect to your Lakebase database and drop the table to free up space:

    SQL
    DROP TABLE your_database.your_schema.your_table;

You can use the SQL editor or external tools to connect to Postgres.

Learn more

Task

Description

Create a database project

Set up a Lakebase database project

Connect to your database

Learn connection options for Lakebase

Register database in Unity Catalog

Make your Lakebase data visible in Unity Catalog for unified governance and cross-source queries

Unity Catalog integration

Understand governance and permissions

Other options

For syncing data into non-Databricks systems, see Partner Connect reverse ETL solutions such as Census or Hightouch.