Lakehouse Sync
Lakebase Autoscaling is available in the following regions: us-east-1, us-east-2, us-west-2, ca-central-1, sa-east-1, eu-central-1, eu-west-1, eu-west-2, ap-south-1, ap-southeast-1, ap-southeast-2.
Lakebase Autoscaling is the latest version of Lakebase, with autoscaling compute, scale-to-zero, branching, and instant restore. If you are a Lakebase Provisioned user, see Lakebase Provisioned.
Lakehouse Sync is in Beta.
What is Lakehouse Sync?
Lakehouse Sync enables continuous, low-latency replication of your Lakebase Postgres tables into Unity Catalog managed Delta tables by capturing row-level changes and writing them as SCD Type 2 history. Each change is appended as a new row, so you keep a full history of how rows changed over time. This sync does not require any external compute, pipelines, or jobs. It is a native Lakebase feature.

Lakehouse Sync uses Change Data Capture (CDC) to stream changes from your Lakebase Postgres database into Unity Catalog. Delta tables are named following the format lb_<table_name>_history in your chosen catalog and schema. Each change (insert, update, delete) is appended as a row, so you keep a full history of how your data evolved over time.
Example use cases
The following are example use cases for Lakehouse Sync, where you stream change data from your Lakebase transactional database into the lakehouse.
Use case | Description |
|---|---|
Fast analytics | Run aggregates and analytics on Lakebase data. |
Medallion source | Use Lakebase as the source of your medallion architecture. Delta tables can be processed with Databricks pipelines, Spark Declarative Pipelines (SDP), or Delta Live Tables (DLT) to build downstream tables. |
Full history in the lakehouse | Preserve the complete history of all changes in the lakehouse while optionally keeping only a subset of data in Lakebase. |
Requirements
- Lakebase Autoscaling: A Lakebase Autoscaling project running Postgres 17.
- Source database: Tables must reside in the
databricks_postgresdatabase in Lakebase (beta limitation). Each project is created with this default database. - Data types: Tables must use only supported column data types. Unsupported types cause the sync to fail for that table.
- Unity Catalog: The identity configuring the sync needs USE CATALOG, USE SCHEMA, and CREATE TABLE on the destination catalog and schema. See Grant permissions on an object.
- Lakebase project: Your Postgres role requires CAN MANAGE permissions on the Lakebase project that you are syncing from. If your identity owns the Lakebase project, it has CAN MANAGE permissions by default. See Manage project permissions.
To get started: set replica identity full on the tables you want to sync (Step 1), then start the sync in the Lakebase app (Step 2). Your data appears as lb_<table_name>_history tables in the Unity Catalog catalog and schema you choose. If you want to learn more about how Lakehouse Sync works before you start, see How Lakehouse Sync works.
Configuring the sync
Step 1: Set replica identity full
For a Lakebase table to sync successfully, it must have replica identity set to full. You can configure the sync on an empty schema or one that already contains tables. Partitioned tables are not supported.
By default, Postgres logs only the primary key when a row is updated or deleted. Setting REPLICA IDENTITY FULL tells Postgres to record the full before-and-after row state in the write-ahead log. This is required so the Lakehouse Sync can build a complete update history.
You can run the following commands in the Lakebase SQL Editor or from any Postgres client. This example uses the Lakebase SQL Editor. To open it: in your Databricks workspace, open Lakebase Postgres from the app switcher (top right), select your project and the branch you want to sync (for example, production or main), then select SQL Editor from the sidebar and choose the branch and database. See Query from Lakebase SQL Editor for details.
- Single table
- Multiple tables
To set replica identity to full on a single table, run:
ALTER TABLE <table_name> REPLICA IDENTITY FULL;
Replace <table_name> with your table name.
To set replica identity to full on all tables in a schema (for example, public), run:
DO $$
DECLARE r record;
BEGIN
FOR r IN
SELECT table_schema, table_name
FROM information_schema.tables
WHERE table_schema = 'public'
AND table_type = 'BASE TABLE'
LOOP
EXECUTE format(
'ALTER TABLE %I.%I REPLICA IDENTITY FULL;',
r.table_schema, r.table_name
);
END LOOP;
END $$;
- Existing tables: Run this on all existing tables in the schema before starting the sync.
- New tables: For tables created after the sync is configured, run this before inserting any data. Rows inserted before this property is set are not synced.
Check which tables have replica identity set
To see which tables have replica identity set (and which are full), run in Lakebase:
SELECT n.nspname AS table_schema,
c.relname AS table_name,
CASE c.relreplident
WHEN 'd' THEN 'default'
WHEN 'n' THEN 'nothing'
WHEN 'f' THEN 'full'
WHEN 'i' THEN 'index'
END AS replica_identity
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r'
AND n.nspname = 'public'
ORDER BY n.nspname, c.relname;
Change n.nspname = 'public' to your schema name if different. Only rows with replica_identity = full are ready for the sync.
Step 2: Start the Lakehouse Sync
Lakehouse Sync is configured at the schema level. Once set up, all current and future tables in that schema sync to Unity Catalog.
- In your Databricks workspace, open Lakebase Postgres from the app switcher (top right).
- Select your Lakebase project and the branch you want to sync (for example, production or main).
- Open Branch overview and click the Lakehouse sync tab.
- Click Start sync.
- In the configuration dialog:
- Database: Defaults to
databricks_postgres. - Schema: Select the source Postgres schema to sync.
- To Catalog: Select the destination Unity Catalog catalog.
- Schema: Select the destination Unity Catalog schema.
- Database: Defaults to
- Click Start sync to begin syncing data.

Tables appear in your chosen Unity Catalog catalog and schema as lb_<table_name>_history. In the Lakehouse, open Catalog in the sidebar, go to your destination catalog and schema, then open the Tables tab on the schema's Overview to see the Delta tables. On the Lakehouse sync tab in Lakebase you can confirm status and inspect what is syncing.
What you see on the Lakehouse sync tab
When sync is enabled, the top of the tab shows Status: Syncing and that changes are being captured and synced to Delta tables.

Two sub-tabs show the mapping and per-table progress:
- Schemas: Lists each source schema and its destination catalog and schema in Unity Catalog, with a Status (for example, Syncing) for that schema.
- Tables: Lists each source table, its destination
lb_<table_name>_historytable in Unity Catalog, Status (Syncing or Snapshotting), Committed LSN (how far the sync has written to Delta; shown as-while a table is still in initial snapshot), and Last Sync (when the table was last synced).
You can also run SELECT * FROM wal2delta.tables; from the Lakebase SQL Editor (or any Postgres client) to inspect sync status. The result includes table_oid, status (for example, STREAMING or SNAPSHOTTING), committed_lsn, and last_write_time for each table.
What is wal2delta? Lakehouse Sync is powered by the wal2delta Postgres extension, which runs inside the Lakebase compute. It uses logical decoding to capture write-ahead log (WAL) changes and writes them to Delta tables in Unity Catalog.
Disabling the sync
Disabling the sync stops replication for all Lakebase schemas that were syncing.
- In your Databricks workspace, open Lakebase Postgres from the app switcher (top right).
- Select your Lakebase project and the branch where you configured the sync (for example, production or main).
- Open Branch overview and click the Lakehouse sync tab.
- Click Disable sync. In the confirmation dialog, review the warning that changes will stop syncing to Delta tables, then click Disable again to confirm.
Disabling the sync does not restart your compute.
If you re-enable the sync later, the system does not perform a full re-snapshot. Any changes that occurred while the sync was disabled are permanently missing from the destination Delta table.
How Lakehouse Sync works
Instead of overwriting the destination Unity Catalog table to mirror the current state of your data in Lakebase, the sync appends a new row for every change event. This provides a complete, immutable log of how your data has evolved over time.
- Destination naming: Delta tables are created in Unity Catalog and use the naming pattern
lb_<table_name>_history. These are UC managed Delta tables. - Schema-level sync: When you configure the sync for a Lakebase schema, all current and future tables in that schema are synced. Empty tables are not synced. There must be at least one row in the Lakebase table for it to appear in the sync.
- Dropped tables: If you drop a table in Lakebase, the destination Delta table in Unity Catalog is preserved (not dropped).
You can monitor sync status in the Lakehouse sync tab in the branch overview or by running SELECT * FROM wal2delta.tables; in Lakebase.
Destination Delta table schema
In addition to your data columns, the sync adds these system columns to each destination Delta table in Unity Catalog:
Column | Type | Description |
|---|---|---|
| TEXT | Operation type: |
| TIMESTAMP | Transaction commit time in Postgres (without timezone). |
| BIGINT | Postgres Log Sequence Number. |
| INTEGER | Postgres Transaction ID. |
Common change patterns
These patterns appear in the destination Delta tables in Unity Catalog:
- Initial load: The first time the sync runs on an existing Lakebase table, each existing row is written with
_change_type=insert. - Updates: An update produces two rows: one with
_change_type=update_preimage(old row) and one with_change_type=update_postimage(new row). - Deletes: A delete produces one row with
_change_type=delete.
Limitations and troubleshooting
You can see the table status (which tables are snapshotting, skipped, or streaming) in the Lakehouse sync tab or by running in Lakebase:
SELECT * FROM wal2delta.tables;
Common reasons a table does not sync:
- REPLICA IDENTITY FULL not set: Ensure you ran
ALTER TABLE <table_name> REPLICA IDENTITY FULL;for each table. - Partitioned tables: Lakebase partitioned tables are not supported. Syncing a schema that contains partitioned tables causes those tables to fail to sync.
- Naming collisions: Delta tables are named
lb_<table_name>_historywithout the source Postgres schema prefix. If you sync two different Postgres schemas (for example,sales.usersandmarketing.users) to the same Unity Catalog schema, the first table syncs and the second fails due to the name collision. Map Postgres schemas that share table names to different Unity Catalog schemas. - Unsupported data types: If a table has an unsupported type, creation of that Delta table fails and the table is not synced. See Data type mapping.
Data type mapping
The sync supports most standard PostgreSQL primitive types. Unsupported types cause the Delta table creation to fail for that table.
PostgreSQL type | Databricks Delta type | Notes |
|---|---|---|
BOOLEAN | BOOLEAN | |
INT, SMALLINT, BIGINT | INT, SMALLINT, BIGINT | |
TEXT, VARCHAR, CHAR | STRING | |
JSONB | STRING | Stored as a JSON string. |
ENUM | STRING | Stored as the enum label. |
NUMERIC / DECIMAL | DECIMAL | Uses source precision/scale. Defaults to DECIMAL(38, 18) if undefined. |
DATE | DATE | |
TIMESTAMP | TIMESTAMP_NTZ | |
TIMESTAMPTZ | TIMESTAMP | |
FLOAT, DOUBLE | FLOAT, DOUBLE |
Unsupported types:
- Geography/Geometry (PostGIS): Types from the PostGIS extension (for example,
geometry,geography). - Vector (pgvector): The
vectortype from the pgvector extension. - Composite/struct types: Custom types defined with
CREATE TYPE ... AS (field_name type, ...). These are row-like types with named fields (sometimes called structs). Columns whose type is a composite type cannot be synced. - Map: Map-like key-value types such as hstore (from the
hstoreextension). Postgres has no built-in map type;hstoreis the usual way to store key-value pairs in a column.
Find columns that may block sync
To find columns that may block sync, run the following query in Lakebase. It lists every column whose type is not in the supported set above (including enums). Any table that has one or more such columns does not sync.
SELECT c.table_schema, c.table_name, c.column_name, c.udt_name AS data_type
FROM information_schema.columns c
JOIN pg_catalog.pg_type t ON t.typname = c.udt_name
WHERE c.table_schema = 'public'
AND c.table_name IN (SELECT tablename FROM pg_tables WHERE schemaname = c.table_schema)
AND NOT (
c.udt_name IN ('bool', 'int2', 'int4', 'int8', 'text', 'varchar', 'bpchar', 'jsonb', 'numeric', 'date', 'timestamp', 'timestamptz', 'real', 'float4', 'float8')
OR t.typcategory = 'E'
)
ORDER BY c.table_schema, c.table_name, c.ordinal_position;
Managing schema changes
Supported: Renaming a table in Postgres (for example, ALTER TABLE users RENAME TO customers) lets the sync continue. The destination Delta table name does not change. It remains lb_users_history.
Breaking: Adding a column, dropping a column, or changing a column's data type is not applied automatically to the destination. Such changes cause the sync for that table to stop receiving new events.
Workaround: To apply schema changes safely without breaking the sync, use the following steps to create a new table and swap it into place. This triggers the sync to recognize a new table and start a fresh history.
-
Create the new table with the updated schema.
SQLCREATE TABLE users_v2 (
id INT PRIMARY KEY,
name TEXT,
new_column TEXT -- The new schema change
); -
Set replica identity full.
SQLALTER TABLE users_v2 REPLICA IDENTITY FULL; -
Backfill data from the old table.
SQLINSERT INTO users_v2 SELECT *, NULL FROM users; -
Swap the table names.
SQLBEGIN;
ALTER TABLE users RENAME TO users_backup;
ALTER TABLE users_v2 RENAME TO users;
COMMIT;
With this approach, you lose the history of the previous table. The new table begins its sync history from scratch.
Deduplicating to create a current-state mirror
For analytics that need the current state of the Lakebase table (a mirror) rather than full history, you can deduplicate in Databricks SQL (for example, in the Lakehouse SQL editor or a notebook attached to a SQL warehouse) using a window function:
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY _lsn DESC) AS rn
FROM
`<catalog>.<schema>.lb_<table_name>_history`
WHERE
_change_type IN ('insert', 'update_postimage', 'delete')
)
WHERE
rn = 1
AND _change_type != 'delete';
Replace id with your table's primary key column. The query partitions by that key, orders by _lsn to get the latest event, and keeps only the latest row per key while excluding rows whose latest event was a delete.
Next steps
Depending on your goal, use the lb_<table_name>_history Delta tables with other Databricks features:
- Fast analytics: Run aggregates and ad-hoc analytics on your synced data using Databricks SQL (query editor, dashboards, notebooks attached to a SQL warehouse). See Get started with data warehousing using Databricks SQL for a full walkthrough.
- Medallion source: Use your Delta tables as the bronze layer and build silver and gold layers with pipelines. See What is the medallion lakehouse architecture? for the pattern and Tutorial: Build an ETL pipeline using change data capture for a hands-on pipeline with CDC and medallion layers using Lakeflow Spark Declarative Pipelines.
- Full history in the lakehouse: Query your Delta tables at a specific point in time with Work with table history (time travel). To derive a current-state mirror from a Delta table, see Deduplicating to create a current-state mirror above.