Sync data from Unity Catalog tables to a database instance
This feature is in Public Preview in the following regions: us-east-1
, us-west-2
, eu-west-1
, ap-southeast-1
, ap-southeast-2
, eu-central-1
, us-east-2
, ap-south-1
.
This page describes how to create and manage a synced table. A synced table is a Unity Catalog read-only Postgres table that automatically synchronizes data from a Unity Catalog table to your Lakebase database instance. Syncing a Unity Catalog table into Postgres enables low-latency read queries and supports query-time joins with other Postgres tables.
The synchronization is handled by a managed Lakeflow declarative pipeline that continuously updates the Postgres table with changes from the source table. After creation, synced tables can be queried directly using Postgres tools.
The key characteristics of synced tables are as follows:
- Read-only in Postgres to maintain data integrity with the source
- Automatically synchronized using managed Lakeflow Declarative Pipelines
- Queryable through standard PostgreSQL interfaces
- Managed through Unity Catalog for governance and lifecycle management
Before you begin
- You have a Unity Catalog table in any catalog.
- You have
CAN USE
permissions on the database instance.
Create a synced table
- UI
- curl
To sync a Unity Catalog table into Postgres, do the following:
-
Click Catalog in the workspace sidebar.
-
Find and select the Unity Catalog table you want to create a synced table on.
-
Click Create > Synced table.
-
Select your catalog, schema, and enter a table name for the new synced table.
- Synced tables can also be created in Standard catalogs, with some additional configuration. Select your Standard catalog, a schema, and enter a table name for the newly created synced table.
-
Select a database instance and enter the name of the Postgres database in which to create the synced table. The Postgres database field will default to the currently selected target catalog. If a Postgres database does not exist under this name, Databricks creates a new one.
-
Select a Primary Key. A primary key is required as it enables efficient access to rows for reads, updates, and deletes.
-
Optionally, select a Timeseries Key. If two rows have the same primary key in the source table, use a Timeseries Key to configure deduplication.
-
Select the sync mode from Snapshot, Triggered, and Continuous. For all sync modes, the whole source table is read and written to Postgres.
Policy
Description
Snapshot
The pipeline runs once to take a snapshot of the source table and copy it to the synced table. Subsequent changes to the source table are automatically reflected in the synced table by taking a new snapshot of the source and creating a new copy. The content of the synced table is updated atomically.
Triggered
The pipeline runs once to create an initial snapshot copy of the source table in the synced table. Unlike the Snapshot sync mode, when the synced table is refreshed, only changes since the last pipeline execution are retrieved and applied to the synced table. The incremental refresh can be manually triggered or automatically triggered according to a schedule.
Continuous
The pipeline runs continuously. Subsequent changes to the source table are incrementally applied to the synced table in real time streaming mode. No manual refresh is necessary.
noteTo support Triggered or Continuous sync mode, the source table must have Change data feed enabled.
-
Choose if you want to create this synced table from a new or existing pipeline.
- If creating a new pipeline and using a managed catalog, you need to choose the storage location for the staging table. If using a standard catalog, the staging table is automatically stored in the catalog.
- If using an existing pipeline, check that the new sync mode matches the pipeline mode.
-
After Synced table status is Online, log in to your database instance and query the newly created table. Query your table using the SQL editor, external tools, or notebooks.
Create a synced table in a database catalog.
export CATALOG_NAME=<Database catalog>
export SRC_TBL="source_catalog.source_schema.source_table"
export DEST_TBL="$CATALOG_NAME.some_schema.synced_table"
export PKS='["id"]'
export ST_CATALOG = "storage_catalog"
export ST_SCHEMA = "storage_schema"
curl -X POST https://$WORKSPACE/api/2.0/database/synced_tables \
-H "Content-Type: text/json" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data-binary @- << EOF
{
"name": "$DEST_TBL",
"spec": {
"source_table_full_name": "$SRC_TBL",
"primary_key_columns": $PKS,
"scheduling_policy": "TRIGGERED",
},
"new_pipeline_spec": {
"storage_catalog": "$ST_CATALOG",
"storage_schema": "$ST_SCHEMA",
}
}
EOF
Create a synced table in a standard Unity Catalog catalog.
export CATALOG_NAME=<Standard catalog>
export DATABASE_INSTANCE=<database instance>
export POSTGRES_DATABASE=$CATALOG_NAME
export SRC_TBL="source_catalog.source_schema.source_table"
export DEST_TBL="$CATALOG_NAME.some_schema.sync_table"
export PKS='["id"]'
export ST_CATALOG = "storage_catalog"
export ST_SCHEMA = "storage_schema"
curl -X POST https://$WORKSPACE/api/2.0/database/synced_tables \
-H "Content-Type: text/json" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data-binary @- << EOF
{
"name": "$DEST_TBL",
"database_instance_name": "$DATABASE_INSTANCE",
"logical_database_name": "$POSTGRES_DATABASE",
"spec": {
"source_table_full_name": "$SRC_TBL",
"primary_key_columns": $PKS,
"scheduling_policy": "TRIGGERED",
},
"new_pipeline_spec": {
"storage_catalog": "$ST_CATALOG",
"storage_schema": "$ST_SCHEMA",
}
}
EOF
Check the status of a synced table.
export SYNCEDTABLE='pg_db.silver.sbtest1_online'
curl --request GET \
"https://e2-dogfood.staging.cloud.databricks.com/api/2.0/database/synced_tables/$SYNCEDTABLE" \
--header "Authorization: Bearer dapi..."
Supported operations
Only a limited set of operations are supported on the Postgres side for synced tables:
- Read-only queries
- Creating indexes
- Dropping the table (to free up space after removing the synced table from Unity Catalog)
Although you are able to modify this table in other ways, it interferes with the synchronization pipeline.
Delete a synced table
To delete a synced table, you must delete it from Unity Catalog and then drop the table in the database instance. Deleting the synced table from Unity Catalog deregisters the table and stops any data refreshes. However, the table remains in the underlying Postgres database. To free up space in your database instance, connect to the instance and use the DROP TABLE
command.
- Click Catalog in the workspace sidebar.
- Find and select the synced table you want to delete.
- Click
> Delete.
- Connect to the instance with
psql
, the SQL editor, or from a notebook. - Drop the table using PostgreSQL.
Psql
DROP TABLE synced_table_database.synced_table_schema.synced_table
Ownership and permissions
If you create a new Postgres database, schema, or table, Postgres ownership is set as follows:
- Ownership is assigned to the user creating the database, schema, or table, if their Databricks login exists as a role in Postgres. To add a Databricks identity role in Postgres, see Create and manage Postgres roles for Databricks identities.
- Otherwise, ownership is assigned to the owner of the parent object in Postgres (typically the
databricks_superuser
).
Manage synced table access
After a synced table is created, the databricks_superuser
can READ
a synced table from Postgres. The databricks_superuser
has pg_read_all_data
and pg_write_all_data
privileges:
-
The
databricks_superuser
can also grant these privileges to other users:PsqlGRANT USAGE ON SCHEMA synced_table_schema TO user;
PsqlGRANT SELECT ON synced_table_name TO user;
-
The
databricks_superuser
can revoke these privileges:PsqlREVOKE USAGE ON SCHEMA synced_table_schema FROM user;
PsqlREVOKE {SELECT | INSERT | UPDATE | DELETE} ON synced_table_name FROM user;
Manage synced table operations
The databricks_superuser
can manage which users are authorized to perform specific operations on a synced table. The supported operations for synced tables are:
CREATE INDEX
ALTER INDEX
DROP INDEX
DROP TABLE
All other DDL operations are denied for synced tables.
To grant these privileges to additional users, the databricks_superuser
must first create an extension on databricks_auth
:
CREATE EXTENSION IF NOT EXISTS databricks_auth;
Then the databricks_superuser
can add a user to manage a synced table:
SELECT databricks_synced_table_add_manager('"synced_table_schema"."synced_table"'::regclass, '[user]');
The databricks_superuser
can remove a user from managing a synced table:
SELECT databricks_synced_table_remove_manager('[table]', '[user]');
The databricks_superuser
can view all managers:
SELECT * FROM databricks_synced_table_managers;
Handle invalid characters
Certain characters, such as the null byte (0x00), are allowed in STRING
, ARRAY
, MAP
, or STRUCT
columns in Delta tables but are not supported in Postgres TEXT
or JSONB
columns. As a result, syncing such data from Delta to Postgres can lead to insertion failures with errors:
org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
org.postgresql.util.PSQLException: ERROR: unsupported Unicode escape sequence DETAIL: \u0000 cannot be converted to text.
- The first error occurs when a null byte appears in a top-level string column, which maps directly to Postgres
TEXT
. - The second error occurs when a null byte appears in a string nested inside a complex type (
STRUCT
,ARRAY
, orMAP
), which Databricks serializes asJSONB
. During serialization, all strings are cast to PostgresTEXT
, where\u0000
is disallowed.
How to resolve:
You can address this issue in one of the following ways:
-
Sanitize string fields
Remove or replace unsupported characters from all string fields, including those inside complex types, before syncing to Postgres.
To remove null bytes from a top-level
STRING
column, use theREPLACE
function:SQLSELECT REPLACE(column_name, CAST(CHAR(0) AS STRING), '') AS cleaned_column FROM your_table;
-
Convert to binary (for
STRING
columns only)If preserving the raw byte content is necessary, convert the affected
STRING
columns toBINARY
.
Limitations and considerations
Naming and identifier limitations
- Allowed characters:
Postgres database, schema, and table names for synced tables may only contain alphanumeric characters and underscores (
[A-Za-z0-9_]+
). Hyphens (-
) and other special characters are not supported. - Column and table identifiers: Avoid using uppercase letters or special characters in column or table names in the source Unity Catalog table. If kept, you need to quote these identifiers when referencing them in Postgres.
Performance and synchronization
- Sync speed: Synchronizing data into synced tables may be slower than writing data directly to the database instance with a native PostgreSQL client due to additional processing. Continuous sync mode refreshes data from the Unity Catalog managed table to the synced table at a minimum interval of 15 seconds.
- Connection usage: Each table synchronization can use up to 16 connections to the database instance, which count toward the instance's connection limit.
- API idempotency: Synced table APIs are idempotent and may need to be retried if errors occur to ensure timely operations.
- Schema changes:
For synced tables in
Triggered
orContinuous
mode, only additive schema changes (e.g., adding a new column) from the Unity Catalog tables are reflected on the synced table. - Duplicate keys: If two rows have the same primary key in the source table, the sync pipeline fails unless you configure deduplication using a Timeseries Key. However, using a Timeseries Key comes with a performance penalty.
- Update rate: The sync pipeline supports continuous writes at approximately 1,200 rows per second per Capacity Unit (CU) and bulk writes at up to 15,000 rows per second per CU.
Capacity and limits
- Table limits:
- Limit of 20 synced tables per source table.
- Each table synchronization may use up to 16 database connections.
- Size limits and full refresh:
- If you full-refresh a synced table, the old version in Postgres is not deleted until the new table is synced. Both versions temporarily count toward the logical database size limit during the refresh.
- Each synced table can be up to 2 TB. If you require refreshes instead of full table recreation, the limit is 1 TB.
- If the uncompressed, row-format size of the Unity Catalog table exceeds the database instance size limit (2 TB), synchronization will fail. You must drop the synced table in Postgres before writing further to the instance.
Catalog integration
- Catalog duplication: Creating a synced table in a standard catalog targeting a Postgres database that is also registered as a separate database catalog will cause the synced table to appear in Unity Catalog under both the standard and the database catalogs.