Troubleshoot PostgreSQL ingestion

Preview

The PostgreSQL connector for Lakeflow Connect is in Public Preview. Reach out to your Databricks account team to enroll in the Public Preview.

This page describes the common issues with the PostgreSQL connector in Databricks Lakeflow Connect and how to resolve them.

General pipeline troubleshooting

The troubleshooting steps in this section apply to all ingestion pipelines in Lakeflow Connect.

If a pipeline fails while running, click the step that failed and confirm whether the error message provides sufficient information about the nature of the error.

View pipeline event logs in the UI

You can also check and download the cluster logs from the pipeline details page by clicking Update details in the right-hand panel, then Logs. Scan the logs for errors or exceptions.

View pipeline update details in the UI

Connector-specific troubleshooting

The troubleshooting steps in this section are specific to the PostgreSQL connector.

Permission errors

If you receive a permission error, verify that the replication user has the necessary privileges. See PostgreSQL database user requirements for the complete list of required privileges.

ERROR: permission denied for table

This error indicates that the replication user does not have SELECT privileges on the specified table. Grant the necessary privileges:

SQL
GRANT SELECT ON TABLE schema_name.table_name TO databricks_replication;

ERROR: must be superuser or replication role to use replication slots

This error indicates that the replication user does not have the REPLICATION privilege. Grant the replication role:

SQL
-- For standard PostgreSQL
ALTER USER databricks_replication WITH REPLICATION;

-- For AWS RDS/Aurora
GRANT rds_replication TO databricks_replication;

-- For GCP Cloud SQL
ALTER USER databricks_replication with REPLICATION;

Check whether logical replication is enabled

To check whether logical replication is enabled:

SQL
SHOW wal_level;

The output should be logical. If it is not, update the wal_level parameter and restart the PostgreSQL server.

For cloud-managed PostgreSQL:

AWS RDS/Aurora: Set rds.logical_replication to 1 in the parameter group.
Azure Database for PostgreSQL: Enable logical replication in the server parameters.
GCP Cloud SQL: Set the cloudsql.logical_decoding flag to on.

Check whether a publication exists

To check whether a publication exists for your tables:

SQL
SELECT * FROM pg_publication WHERE pubname = 'databricks_publication';

-- Check which tables are included in the publication
SELECT schemaname, tablename
FROM pg_publication_tables
WHERE pubname = 'databricks_publication';

If the publication does not exist, create it:

SQL
CREATE PUBLICATION databricks_publication FOR TABLE schema_name.table_name;

Check replica identity

To check the replica identity setting for a table:

SQL
SELECT schemaname, tablename, relreplident
FROM pg_tables t
JOIN pg_class c ON t.tablename = c.relname
WHERE schemaname = 'your_schema' AND tablename = 'your_table';

The relreplident column should show the following values:

f for FULL replica identity (required for tables without primary keys or TOASTable columns).
d for DEFAULT replica identity (uses primary key).

If the replica identity is not set correctly, update it:

SQL
ALTER TABLE schema_name.table_name REPLICA IDENTITY FULL;

Replication slot errors

WAL accumulation and disk space issues

If the ingestion gateway is stopped for an extended period, the replication slot can cause Write-Ahead Log (WAL) files to accumulate, potentially filling up disk space.

To check WAL disk usage:

SQL
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
WHERE slot_name = 'your_slot_name';

To prevent WAL accumulation:

Ensure the ingestion gateway runs continuously.
Monitor replication lag and disk space regularly.

If WAL files have accumulated, you can manually drop the replication slot:

SQL
SELECT pg_drop_replication_slot('your_slot_name');

warning

If a replication slot is dropped or becomes invalid, update the pipeline spec with a new slot for that database and run a full refresh.

Timeout while waiting for table token

The ingestion pipeline might time out while waiting for information to be provided by the gateway. This might be due to one of the following reasons:

You are running an older version of the gateway.
There was an error generating the necessary information. Check the gateway driver logs for errors.
The initial snapshot is taking longer than expected. For large tables, consider increasing the pipeline timeout or running the initial load during off-peak hours.

Source table naming conflict

Ingestion pipeline error: "org.apache.spark.sql.catalyst.ExtendedAnalysisException: Cannot have multiple queries named `orders_snapshot_load` for `orders`. Additional queries on that table must be named. Note that unnamed queries default to the same name as the table.

This indicates that there is a name conflict due to multiple source tables named orders in different source schemas that are being ingested by the same ingestion pipeline to the same destination schema.

Create multiple gateway-pipeline pairs writing these conflicting tables to different destination schemas.

Incompatible schema changes

An incompatible schema change causes the ingestion pipeline to fail with an INCOMPATIBLE_SCHEMA_CHANGE error. To continue replication, trigger a full refresh of the affected tables.

Incompatible schema changes include the following:

Changing the data type of a column
Renaming a column
Changing the primary key of a table
Dropping a column that is part of the replica identity

note

Databricks cannot guarantee that all rows prior to the schema change have been ingested when the ingestion pipeline fails due to an incompatible schema change.

Connection timeout errors

If you receive connection timeout errors, perform the following checks:

Verify that the firewall rules allow connections from the Databricks workspace.
Check that the PostgreSQL server is reachable from the Databricks network.
Ensure that the pg_hba.conf file allows connections from the Databricks IP range.
Verify that the connection credentials are correct.

SSL/TLS connection errors

If you receive SSL/TLS connection errors, perform the following checks:

Verify that the PostgreSQL server supports SSL connections.
Check the ssl parameter in the PostgreSQL configuration:
SQL
```
SHOW ssl;
```
Ensure that the pg_hba.conf file requires or allows SSL connections for the replication user.
For cloud-managed databases, verify that SSL is enforced in the server settings.

default auth: cannot configure default credentials

If you receive this error, there is an issue with discovering the current user credentials. Try replacing the following:

w = WorkspaceClient()

with:

w = WorkspaceClient(host=input('Databricks Workspace URL: '), token=input('Token: '))

See Authentication in the Databricks SDK for Python documentation.

PERMISSION_DENIED: You are not authorized to create clusters. Please contact your administrator.

Contact a Databricks account admin to grant you Unrestricted cluster creation permissions.

DLT ERROR CODE: INGESTION_GATEWAY_INTERNAL_ERROR

Check the stdout files in the driver logs for detailed error messages. Common causes include the following:

Replication slot errors
Publication configuration issues
Network connectivity problems
Insufficient privileges on the source database

General pipeline troubleshooting​

Connector-specific troubleshooting​

Permission errors​

ERROR: permission denied for table​

ERROR: must be superuser or replication role to use replication slots​

Check whether logical replication is enabled​

Check whether a publication exists​

Check replica identity​

Replication slot errors​

WAL accumulation and disk space issues​

Timeout while waiting for table token​

Source table naming conflict​

Incompatible schema changes​

Connection timeout errors​

SSL/TLS connection errors​

default auth: cannot configure default credentials​

PERMISSION_DENIED: You are not authorized to create clusters. Please contact your administrator.​

DLT ERROR CODE: INGESTION_GATEWAY_INTERNAL_ERROR​