Serve lakehouse data with synced tables (Lakebase Provisioned)

info

Lakebase Provisioned is the original Lakebase offering that uses provisioned compute you scale manually. For supported regions, see Region availability. For the latest version of Lakebase, with autoscaling compute, scale-to-zero, branching, and instant restore, see Lakebase Autoscaling.

Since March 12, 2026, new Lakebase instances are created as Autoscaling projects. Existing Provisioned instances are being upgraded automatically to Autoscaling, starting in June 2026. For details, see Upgrade to Lakebase Autoscaling.

Synced tables let you serve lakehouse data through Lakebase Provisioned Postgres instances. Unity Catalog tables sync into Postgres so applications can query lakehouse data directly with low latency. This process is commonly known as reverse ETL. The lakehouse is optimized for analytics and enrichment, while Lakebase is designed for operational workloads that require fast lookup-style queries and transactional consistency.

What are synced tables?

Synced tables let you serve analytics-grade data from Unity Catalog through Lakebase Provisioned Postgres, making it available to applications that need low-latency queries and full ACID transactions. They bridge the gap between analytical storage and operational systems by keeping your data ready to serve in real-time applications.

Sync pipelines use managed Lakeflow pipelines to continuously update both the Unity Catalog synced table and the Postgres table with changes from the source table. After creation, synced tables can be queried directly using Postgres tools.

The key characteristics of synced tables are as follows:

Run only read queries against the table in Postgres to maintain data integrity with the source
Automatically synchronized using managed Lakeflow pipelines
Queryable through standard PostgreSQL interfaces
Managed through Unity Catalog for governance and lifecycle management

warning

While it's possible to modify a synced table directly in Postgres, Databricks strictly recommends running only read queries to protect data integrity with the source. For supported operations on synced tables, see Supported operations.

Before you begin

You have a Unity Catalog table in any catalog.
You have CAN USE permissions on the database instance.

Create a synced table

UI
Python SDK
CLI
curl

To sync a Unity Catalog table into Postgres, do the following:

Click Catalog in the workspace sidebar.
Find and select the Unity Catalog table you want to create a synced table on.
Click Create > Synced table.
Select your catalog, schema, and enter a table name for the new synced table.
- Synced tables can also be created in Standard catalogs, with some additional configuration. Select your Standard catalog, a schema, and enter a table name for the newly created synced table.
Select a database instance and enter the name of the Postgres database in which to create the synced table. The Postgres database field defaults to the currently selected target catalog. If a Postgres database does not exist under this name, Databricks creates a new one.
Select a Primary Key. A primary key is required as it enables efficient access to rows for reads, updates, and deletes.

important
Columns in the primary key are not nullable in the synced table. Therefore, rows with nulls in primary key columns are excluded from the sync.
If two rows have the same primary key in the source table, select a Timeseries Key to configure deduplication. When a Timeseries Key is specified, the synced tables contains only the rows with the latest timeseries key value for each primary key.
Select the sync mode from Snapshot, Triggered, and Continuous. For more information about each sync mode, see Sync modes explained.
Choose if you want to create this synced table from a new or existing pipeline.
- If creating a new pipeline and using a managed catalog, choose the storage location for the staging table. If using a standard catalog, the staging table is automatically stored in the catalog.
- If using an existing pipeline, check that the new sync mode matches the pipeline mode.
(Optional) Select a Serverless usage policy. To create a serverless usage policy, see Attribute usage with serverless usage policies. This allows you to attribute billing usage to specific serverless usage policies.
- For synced tables, the billable entity is the underlying pipeline. To modify the usage policy, modify the underlying pipeline object. See Configure a serverless pipeline.
After Synced table status is Online, log in to your database instance and query the newly created table. Query your table using the SQL editor, external tools, or notebooks.

Python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.database import SyncedDatabaseTable, SyncedTableSpec, NewPipelineSpec, SyncedTableSchedulingPolicy

# Initialize the Workspace client
w = WorkspaceClient()

# Create a synced table in a database catalog
synced_table = w.database.create_synced_database_table(
    SyncedDatabaseTable(
        name="database_catalog.schema.synced_table",  # Full three-part name
        spec=SyncedTableSpec(
            source_table_full_name="source_catalog.source_schema.source_table",
            primary_key_columns=["id"],  # Primary key columns
            scheduling_policy=SyncedTableSchedulingPolicy.TRIGGERED,  # SNAPSHOT, TRIGGERED, or CONTINUOUS
            # Optional: timeseries_key="timestamp"  # For deduplication
            new_pipeline_spec=NewPipelineSpec(
                storage_catalog="storage_catalog",
                storage_schema="storage_schema"
            )
        ),
    )
)
print(f"Created synced table: {synced_table.name}")

# Create a synced table in a standard UC catalog
synced_table = w.database.create_synced_database_table(
    SyncedDatabaseTable(
        name="standard_catalog.schema.synced_table",  # Full three-part name
        database_instance_name="my-database-instance",  # Required for standard catalogs
        logical_database_name="postgres_database",  # Required for standard catalogs
        spec=SyncedTableSpec(
            source_table_full_name="source_catalog.source_schema.source_table",
            primary_key_columns=["id"],
            scheduling_policy=SyncedTableSchedulingPolicy.CONTINUOUS,
            create_database_objects_if_missing=True,  # Create database/schema if needed
            new_pipeline_spec=NewPipelineSpec(
                storage_catalog="storage_catalog",
                storage_schema="storage_schema"
            )
        ),
    )
)
print(f"Created synced table: {synced_table.name}")

# Check the status of a synced table
synced_table_name = "database_catalog.schema.synced_table"
status = w.database.get_synced_database_table(name=synced_table_name)
print(f"Synced table status: {status.data_synchronization_status.detailed_state}")
print(f"Status message: {status.data_synchronization_status.message}")

Bash
# Create a synced table in a database catalog
databricks database create-synced-database-table  \
  --json '{
    "spec": {
      "name": "database_catalog.schema.synced_table",
      "source_table_full_name": "source_catalog.source_schema.source_table",
      "primary_key_columns": ["id"],
      "scheduling_policy": "TRIGGERED"
    },
    "new_pipeline_spec": {
      "storage_catalog": "storage_catalog",
      "storage_schema": "storage_schema"
    }
  }'

# Create a synced table in a standard UC catalog
# new_pipeline_spec, storage_catalog, and storage_schema are optional
databricks database create-synced-database-table \
  --database-instance-name "my-database-instance" \
  --logical-database-name "databricks_postgres" \
  --json '{
    "name": "standard_catalog.schema.synced_table",
    "spec": {
      "source_table_full_name": "source_catalog.source_schema.source_table",
      "primary_key_columns": ["id"],
      "scheduling_policy": "CONTINUOUS",
      "create_database_objects_if_missing": true
    }
  }'

# Check the status of a synced table
databricks database get-synced-database-table "database_catalog.schema.synced_table"

Create a synced table in a database catalog.

Bash
export CATALOG_NAME=<Database catalog>
export SRC_TBL="source_catalog.source_schema.source_table"
export DEST_TBL="$CATALOG_NAME.some_schema.synced_table"
export PKS='["id"]'
export ST_CATALOG="storage_catalog"
export ST_SCHEMA="storage_schema"

curl -X POST https://$WORKSPACE/api/2.0/database/synced_tables \
-H "Content-Type: text/json" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data-binary @- << EOF
{
  "name": "$DEST_TBL",
  "spec": {
    "source_table_full_name": "$SRC_TBL",
    "primary_key_columns": $PKS,
    "scheduling_policy": "TRIGGERED",
  },
  "new_pipeline_spec": {
    "storage_catalog": "$ST_CATALOG",
    "storage_schema": "$ST_SCHEMA",
  }
}
EOF

Create a synced table in a standard Unity Catalog catalog.

Bash
export CATALOG_NAME=<Standard catalog>
export DATABASE_INSTANCE=<database instance>
export POSTGRES_DATABASE=$CATALOG_NAME
export SRC_TBL="source_catalog.source_schema.source_table"
export DEST_TBL="$CATALOG_NAME.some_schema.sync_table"
export PKS='["id"]'
export ST_CATALOG="storage_catalog"
export ST_SCHEMA="storage_schema"

curl -X POST https://$WORKSPACE/api/2.0/database/synced_tables \
-H "Content-Type: text/json" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data-binary @- << EOF
{
  "name": "$DEST_TBL",
  "database_instance_name": "$DATABASE_INSTANCE",
  "logical_database_name": "$POSTGRES_DATABASE",
  "spec": {
    "source_table_full_name": "$SRC_TBL",
    "primary_key_columns": $PKS,
    "scheduling_policy": "TRIGGERED",
  },
  "new_pipeline_spec": {
    "storage_catalog": "$ST_CATALOG",
    "storage_schema": "$ST_SCHEMA",
  }
}
EOF

Check the status of a synced table.

Bash
export SYNCEDTABLE='pg_db.silver.sbtest1_online'

curl --request GET \
  "https://e2-dogfood.staging.cloud.databricks.com/api/2.0/database/synced_tables/$SYNCEDTABLE" \
  --header "Authorization: Bearer dapi..."

Sync modes explained

A synced table can be created with one of the following sync modes, which determine how data is synchronized from the source to the synced table in Postgres:

Sync mode	Description	When to use	Performance
Snapshot	The pipeline runs once to take a snapshot of the source table and copy it to the synced table. Subsequent pipeline runs copy the whole source data to the destination and replace it in place atomically. The pipeline can be triggered manually, through an API or on a schedule.	Source changes >10% of rows per cycle, or the source does not support Change Data Feed (views, Iceberg tables).	10x more efficient than incremental modes for high-churn tables.
Triggered	The pipeline runs once to take a snapshot of the source table and copy it to the synced table. Unlike the Snapshot sync mode, when the synced table is refreshed, only changes since the last pipeline execution are retrieved and applied to the synced table. The incremental refresh can be triggered manually, through an API or on a schedule.	Source rows change incrementally on a known cadence. Inserts, updates, and deletes are all propagated on each refresh.	Good cost/lag tradeoff for on-demand or scheduled syncs. Running more frequently than every 5 minutes may cost more than Continuous mode.
Continuous	The pipeline runs once to take a snapshot of the source table and copy it to the synced table then the pipeline runs continuously. Subsequent changes to the source table are incrementally applied to the synced table in real time. No manual refresh is necessary.	Changes must appear in Lakebase in near real time.	Lowest lag, highest cost.

Sync mode	Description	When to use	Performance
Snapshot	The pipeline runs once to take a snapshot of the source table and copy it to the synced table. Subsequent pipeline runs copy the whole source data to the destination and replace it in place atomically. The pipeline can be triggered manually, through an API or on a schedule.	Source changes >10% of rows per cycle, or the source does not support Change Data Feed (views, Iceberg tables).	10x more efficient than incremental modes for high-churn tables.
Triggered	The pipeline runs once to take a snapshot of the source table and copy it to the synced table. Unlike the Snapshot sync mode, when the synced table is refreshed, only changes since the last pipeline execution are retrieved and applied to the synced table. The incremental refresh can be triggered manually, through an API or on a schedule.	Source rows change incrementally on a known cadence. Inserts, updates, and deletes are all propagated on each refresh.	Good cost/lag tradeoff for on-demand or scheduled syncs. Running more frequently than every 5 minutes may cost more than Continuous mode.
Continuous	The pipeline runs once to take a snapshot of the source table and copy it to the synced table then the pipeline runs continuously. Subsequent changes to the source table are incrementally applied to the synced table in real time. No manual refresh is necessary.	Changes must appear in Lakebase in near real time.	Lowest lag, highest cost.

note

To support Triggered or Continuous sync mode, the source table must have Change data feed enabled. Certain sources (like Views) do not support change data feed so they can only be synced in Snapshot mode.

Schedule or trigger subsequent syncs

The initial snapshot runs automatically on creation. For Snapshot and Triggered modes, subsequent syncs must be triggered explicitly. Continuous mode is self-managing.

Database Table Sync pipeline task

The Database Table Sync pipeline task in Lakeflow Jobs runs a synced table's pipeline as a workflow step. Configure the job with a table update trigger or a schedule.

Trigger on source table updates

Fires the job when the source Unity Catalog table is updated. With Triggered mode, only new changes are applied incrementally, giving near-real-time freshness without the always-on cost of Continuous mode.

In the sidebar, click Workflows.
Click Create job or open an existing job.
On the Tasks tab, click + Add another task type.
Under Ingestion and Transformation, select Database Table Sync pipeline.
In the Pipeline field, select the pipeline associated with your synced table.
Under Schedules & Triggers, click Add trigger.
Select Table update as the trigger type.
Under Tables, select the source Unity Catalog table to monitor.
Click Save.

Trigger on a schedule

Runs the sync at a fixed cadence. Well-suited for Snapshot mode, where a nightly or weekly full refresh is typically the most efficient pattern.

Follow steps 1–5 above to add a Database Table Sync pipeline task to a job.
Under Schedules & Triggers, click Add trigger.
Select Scheduled as the trigger type.
Set your cron schedule and timezone, then click Save.

Using the SDK

Trigger a sync run programmatically, for example at the end of an upstream notebook or pipeline:

Python
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Get the pipeline ID from the synced table
table = w.database.get_synced_database_table(name="catalog.schema.synced_table")
pipeline_id = table.data_synchronization_status.pipeline_id

# Trigger a sync run
w.pipelines.start_update(pipeline_id=pipeline_id)

Supported operations

Databricks recommends performing only the following operations in Postgres for synced tables to prevent accidental overwrites or data inconsistencies:

Read-only queries
Creating indexes
Dropping the table (to free up space after removing the synced table from Unity Catalog)

While it's possible to modify synced tables in Postgres in other ways, doing so interferes with the synchronization pipeline.

Delete a synced table

To delete a synced table, you must delete it from Unity Catalog and then drop the table in the database instance. Deleting the synced table from Unity Catalog deregisters the table and stops any data refreshes. However, the table remains in the underlying Postgres database. To free up space in your database instance, connect to the instance and use the DROP TABLE command.

UI
Python SDK
CLI
curl

Click Catalog in the workspace sidebar.
Find and select the synced table you want to delete.
Click > Delete.
Connect to the instance with psql, the SQL editor, or from a notebook.

Drop the table using PostgreSQL.

PostgreSQL
DROP TABLE synced_table_database.synced_table_schema.synced_table

Python
from databricks.sdk import WorkspaceClient

# Initialize the Workspace client
w = WorkspaceClient()

# Delete a synced table from UC
synced_table_name = "catalog.schema.synced_table"
w.database.delete_synced_database_table(name=synced_table_name)
print(f"Deleted synced table from UC: {synced_table_name}")

# To free up space in your database instance, you need to connect to the
# instance and drop the table using PostgreSQL:
#
# DROP TABLE synced_table_database.synced_table_schema.synced_table;

Bash
# Delete a synced table from UC
databricks database delete-synced-database-table "catalog.schema.synced_table"

# To free up space in your database instance, you need to connect to the
# instance and drop the table using PostgreSQL:
#
# DROP TABLE synced_table_database.synced_table_schema.synced_table;

Bash
# Delete a synced table from UC
curl -X DELETE --header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
  https://$WORKSPACE/api/2.0/database/synced_tables/$SYNCED_TABLE_NAME

# To free up space in your database instance, you need to connect to the
# instance and drop the table using PostgreSQL:
#
# DROP TABLE synced_table_database.synced_table_schema.synced_table;

Ownership and permissions

If you create a new Postgres database, schema, or table, Postgres ownership is set as follows:

Ownership is assigned to the user creating the database, schema, or table, if their Databricks login exists as a role in Postgres. To add a Databricks identity role in Postgres, see Postgres roles.
Otherwise, ownership is assigned to the owner of the parent object in Postgres (typically the databricks_superuser).

Manage synced table access

After a synced table is created, the databricks_superuser can READ a synced table from Postgres. The databricks_superuser has pg_read_all_data, which lets this role read from all tables. It also has the pg_write_all_data privilege, which lets this role write to all tables. This means that a databricks_superuser can also write to a synced table in Postgres. Lakebase supports this writing behavior in case you need to make urgent changes in your target table. However, Databricks recommends making fixes in your source table instead.

The databricks_superuser can also grant these privileges to other users:

PostgreSQL
GRANT USAGE ON SCHEMA synced_table_schema TO user;

PostgreSQL
GRANT SELECT ON synced_table_name TO user;

The databricks_superuser can revoke these privileges:

PostgreSQL
REVOKE USAGE ON SCHEMA synced_table_schema FROM user;

PostgreSQL
REVOKE {SELECT | INSERT | UPDATE | DELETE} ON synced_table_name FROM user;

Manage synced table operations

The databricks_superuser can manage which users are authorized to perform specific operations on a synced table. The supported operations for synced tables are:

CREATE INDEX
ALTER INDEX
DROP INDEX
DROP TABLE

All other DDL operations are denied for synced tables.

To grant these privileges to additional users, the databricks_superuser must first create an extension on databricks_auth:

PostgreSQL
CREATE EXTENSION IF NOT EXISTS databricks_auth;

Then the databricks_superuser can add a user to manage a synced table:

PostgreSQL
SELECT databricks_synced_table_add_manager('"synced_table_schema"."synced_table"'::regclass, '[user]');

The databricks_superuser can remove a user from managing a synced table:

PostgreSQL
SELECT databricks_synced_table_remove_manager('[table]', '[user]');

The databricks_superuser can view all managers:

PostgreSQL
SELECT * FROM databricks_synced_table_managers;

Data type mapping

This type mapping table defines how each data type in the source Unity Catalog table is mapped to the destination sync table in Postgres:

Source column type	Postgres column type
BIGINT	BIGINT
BINARY	BYTEA
BOOLEAN	BOOLEAN
DATE	DATE
DECIMAL(p,s)	NUMERIC
DOUBLE	DOUBLE PRECISION
FLOAT	REAL
INT	INTEGER
INTERVAL intervalQualifier	INTERVAL
SMALLINT	SMALLINT
STRING	TEXT
TIMESTAMP	TIMESTAMP WITH TIME ZONE
TIMESTAMP_NTZ	TIMESTAMP WITHOUT TIME ZONE
TINYINT	SMALLINT
GEOGRAPHY(srid)	NOT SUPPORTED
GEOMETRY(srid)	NOT SUPPORTED
ARRAY < elementType >	JSONB
MAP < keyType,valueType >	JSONB
STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] >	JSONB
VARIANT	JSONB
OBJECT	NOT SUPPORTED

Source column type	Postgres column type
BIGINT	BIGINT
BINARY	BYTEA
BOOLEAN	BOOLEAN
DATE	DATE
DECIMAL(p,s)	NUMERIC
DOUBLE	DOUBLE PRECISION
FLOAT	REAL
INT	INTEGER
INTERVAL intervalQualifier	INTERVAL
SMALLINT	SMALLINT
STRING	TEXT
TIMESTAMP	TIMESTAMP WITH TIME ZONE
TIMESTAMP_NTZ	TIMESTAMP WITHOUT TIME ZONE
TINYINT	SMALLINT
GEOGRAPHY(srid)	NOT SUPPORTED
GEOMETRY(srid)	NOT SUPPORTED
ARRAY < elementType >	JSONB
MAP < keyType,valueType >	JSONB
STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] >	JSONB
VARIANT	JSONB
OBJECT	NOT SUPPORTED

note

The mapping for ARRAY, MAP, and STRUCT types was changed in May 2025. Sync tables created prior to that continue to map those types to JSON.
The mapping for TIMESTAMP was changed in August 2025. Sync tables created prior to that continue to map it to TIMESTAMP WITHOUT TIME ZONE.

Handle invalid characters

Certain characters, such as the null byte (0x00), are allowed in STRING, ARRAY, MAP, or STRUCT columns in Delta tables but are not supported in Postgres TEXT or JSONB columns. As a result, syncing such data from Delta to Postgres can lead to insertion failures with errors:

org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00

org.postgresql.util.PSQLException: ERROR: unsupported Unicode escape sequence DETAIL: \u0000 cannot be converted to text.

The first error occurs when a null byte appears in a top-level string column, which maps directly to Postgres TEXT.
The second error occurs when a null byte appears in a string nested inside a complex type (STRUCT, ARRAY, or MAP), which Databricks serializes as JSONB. During serialization, all strings are cast to Postgres TEXT, where \u0000 is disallowed.

How to resolve:

You can address this issue in one of the following ways:

Sanitize string fields

Remove or replace unsupported characters from all string fields, including those inside complex types, before syncing to Postgres.

To remove null bytes from a top-level STRING column, use the REPLACE function:
SQL
```
SELECT REPLACE(column_name, CAST(CHAR(0) AS STRING), '') AS cleaned_column FROM your_table;
```
Convert to binary (for STRING columns only)

If preserving the raw byte content is necessary, convert the affected STRING columns to BINARY.

Limitations and considerations

Supported source tables

Depending on the sync mode of a synced table, different types of source tables are supported:

For Snapshot mode, the source table must support SELECT *. Examples include Delta tables, Iceberg tables, views, materialized views, and other similar types.
For Triggered or Continuous sync modes, the source table must also have change data feed enabled.

Naming and identifier limitations

Allowed characters: Postgres database, schema, and table names for synced tables may only contain alphanumeric characters and underscores ([A-Za-z0-9_]+). Hyphens (-) and other special characters are not supported.
Column and table identifiers: Avoid using uppercase letters or special characters in column or table names in the source Unity Catalog table. If kept, you need to quote these identifiers when referencing them in Postgres.

Performance and synchronization

Sync speed: Synchronizing data into synced tables may be slower than writing data directly to the database instance with a native PostgreSQL client due to additional processing. Continuous sync mode refreshes data from the Unity Catalog managed table to the synced table at a minimum interval of 15 seconds.
Connection usage: Each table synchronization can use up to 16 connections to the database instance, which count toward the instance's connection limit.
API idempotency: Synced table APIs are idempotent and may need to be retried if errors occur to ensure timely operations.
Schema changes: For synced tables in Triggered or Continuous mode, only additive schema changes (e.g., adding a new column) from the Unity Catalog tables are reflected on the synced table.
Duplicate keys: If two rows have the same primary key in the source table, the sync pipeline fails unless you configure deduplication using a timeseries key. However, using a timeseries key has a performance penalty.
Update rate: For Lakebase Provisioned, the sync pipeline supports Continuous and Triggered writes at approximately 1200 rows per second per Capacity Unit (CU) and Snapshot writes at up to 15,000 rows per second per CU.

Capacity and limits

Table limits:
- Limit of 20 synced tables per source table.
- Each table synchronization may use up to 16 database connections.
Size limits and full refresh:
- If you full-refresh a synced table, the old version in Postgres is not deleted until the new table is synced. Both versions temporarily count toward the logical database size limit during the refresh.
- Individual synced tables do not have a limit, but the total logical data size limit across all tables in the instance is 2 TB. However, if you require refreshes instead of full table recreation, Databricks recommends not exceeding 1 TB.
- If the uncompressed, row-format size of the Unity Catalog table exceeds the database instance size limit (2 TB), synchronization fails. You must drop the synced table in Postgres before writing further to the instance.

Catalog integration

Catalog duplication: Creating a synced table in a standard catalog targeting a Postgres database that is also registered as a separate database catalog causes the synced table to appear in Unity Catalog under both the standard and the database catalogs.

What are synced tables?​

Before you begin​

Create a synced table​

Sync modes explained​

Schedule or trigger subsequent syncs​

Database Table Sync pipeline task​

Trigger on source table updates​

Trigger on a schedule​

Using the SDK​

Supported operations​

Delete a synced table​

Ownership and permissions​

Manage synced table access​

Manage synced table operations​

Data type mapping​

Handle invalid characters​

Limitations and considerations​

Supported source tables​

Naming and identifier limitations​

Performance and synchronization​

Capacity and limits​

Catalog integration​