SQL Server connector limitations

Preview

The Microsoft SQL Server connector is in Public Preview.

This page lists limitations and considerations for SQL Server ingestion using Databricks Lakeflow Connect.

General database connector limitations

The limitations in this section apply to all database connectors in Lakeflow Connect.

When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with DLT behavior.
During source maintenance periods, Databricks might not be able to access your data.
If a source table name conflicts with an existing destination table name, the pipeline update fails.
Multi-destination pipeline support is API-only.
You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
Managed ingestion pipelines aren't supported for the following:
- Workspaces in AWS GovCloud regions
- Workspaces in Azure GovCloud regions
- FedRAMP-compliant workspaces
Databricks can't ingest two or more tables with the same name in the same pipeline, even if they come from different source schemas.

The limitations in this section are specific to the SQL Server connector.

The connector only supports username and password authentication.
Multi-subnet failover cluster configurations must provide a single forward-facing IP address.

The connector supports Azure SQL and Amazon RDS SQL databases. This includes SQL Server running on Azure virtual machines (VMs) and Amazon EC2. The connector also supports SQL Server on-premises using Azure ExpressRoute and AWS Direct Connect networking.
To use Microsoft change data capture (CDC):
- You must have SQL Server 2012 service pack 1 (SP1) cumulative update package 3 (CU3) or above.
  
  This update introduced the __$command_id column. Without this column, the ingestion gateway can't reliably distinguish between data modification operation types (for example, UPDATE operations that manifest as DELETE-INSERT pairs with identical __$seqval values). This can cause data inconsistencies.
- For versions earlier than SQL Server 2016, Enterprise Edition is also required.
To use Microsoft change tracking, you must have SQL Server 2012 or above.

Each ingestion pipeline must be associated with exactly one ingestion gateway.
Although the ingestion pipeline runs on serverless compute, the ingestion gateway must run on classic compute.

The connector automatically handles new and deleted columns unless you opt out.

When a new column appears in the source, Databricks automatically ingests it on the next run of the pipeline. However, you can opt out of this.
When a column is deleted from the source, Databricks doesn't delete it automatically. Instead, the connector uses a table property to set the deleted column to inactive in the destination. If another column later appears that has a conflicting name with the inactive column, the pipeline fails. In this case, you can run a full refresh of the table or manually drop the inactive column.

This is true for new and deleted tables within a schema if you ingest the entire schema.

Finally, the connector can handle column renames, although this requires a full refresh of the table.

Additional schema changes (for example, data type changes) also require a full refresh of the target tables.

The staging catalog cannot be a foreign catalog.

Databricks recommends ingesting 250 or fewer tables per pipeline. However, there is no limit on the number of rows or columns that are supported within these objects.
Databricks can't ingest two tables whose names differ only in case (for example, MyTable and MYTABLE) using one pipeline. To support such cases, create two gateway-ingestion pipeline pairs that publish to target schemas.
The source_catalog, source_schema, and source_table names are case-sensitive. For example, if the source database catalog is specified as Marketing in sys.databases, you can't specify it as marketing in the ingestion_definition.
Although you can ingest from multiple source catalogs or schemas in one pipeline, you can't ingest two tables of the same name. For example, you can't ingest both schema1.Marketing and schema2.Marketing in the same pipeline.
Multiple table or schema specifications can be included in the objects field of the ingestion_definition. However, the source table names in different source schemas can't overlap. This results in ingestion pipeline failure.