SQL Server connector limitations
The Microsoft SQL Server connector is in Public Preview.
This page lists limitations and considerations for SQL Server ingestion using Databricks Lakeflow Connect.
General database connector limitations
The limitations in this section apply to all database connectors in Lakeflow Connect.
- When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
- When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with DLT behavior.
- During source maintenance periods, Databricks might not be able to access your data.
- If a source table name conflicts with an existing destination table name, the pipeline update fails.
- Multi-destination pipeline support is API-only.
- You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
- If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
- Managed ingestion pipelines aren't supported for the following:
- Workspaces in AWS GovCloud regions
- Workspaces in Azure GovCloud regions
- FedRAMP-compliant workspaces
- Databricks can't ingest two or more tables with the same name in the same pipeline, even if they come from different source schemas.
Connector-specific
The limitations in this section are specific to the SQL Server connector.
Authentication
- The connector only supports username and password authentication.
- Multi-subnet failover cluster configurations must provide a single forward-facing IP address.
Database variations
-
The connector supports Azure SQL and Amazon RDS SQL databases. This includes SQL Server running on Azure virtual machines (VMs) and Amazon EC2. The connector also supports SQL Server on-premises using Azure ExpressRoute and AWS Direct Connect networking.
-
To use Microsoft change data capture (CDC):
-
You must have SQL Server 2012 service pack 1 (SP1) cumulative update package 3 (CU3) or above.
This update introduced the
__$command_id
column. Without this column, the ingestion gateway can't reliably distinguish between data modification operation types (for example,UPDATE
operations that manifest asDELETE-INSERT
pairs with identical__$seqval
values). This can cause data inconsistencies. -
For versions earlier than SQL Server 2016, Enterprise Edition is also required.
-
-
To use Microsoft change tracking, you must have SQL Server 2012 or above.
Pipelines
- Each ingestion pipeline must be associated with exactly one ingestion gateway.
- Although the ingestion pipeline runs on serverless compute, the ingestion gateway must run on classic compute.
Schema evolution
The connector automatically handles new and deleted columns unless you opt out.
- When a new column appears in the source, Databricks automatically ingests it on the next run of the pipeline. However, you can opt out of this.
- When a column is deleted from the source, Databricks doesn't delete it automatically. Instead, the connector uses a table property to set the deleted column to
inactive
in the destination. If another column later appears that has a conflicting name with theinactive
column, the pipeline fails. In this case, you can run a full refresh of the table or manually drop the inactive column.
This is true for new and deleted tables within a schema if you ingest the entire schema.
Finally, the connector can handle column renames, although this requires a full refresh of the table.
Additional schema changes (for example, data type changes) also require a full refresh of the target tables.
Staging
The staging catalog cannot be a foreign catalog.
Tables
- Databricks recommends ingesting 250 or fewer tables per pipeline. However, there is no limit on the number of rows or columns that are supported within these objects.
- Databricks can't ingest two tables whose names differ only in case (for example,
MyTable
andMYTABLE
) using one pipeline. To support such cases, create two gateway-ingestion pipeline pairs that publish to target schemas. - The
source_catalog
,source_schema
, andsource_table
names are case-sensitive. For example, if the source database catalog is specified asMarketing
insys.databases
, you can't specify it asmarketing
in theingestion_definition
. - Although you can ingest from multiple source catalogs or schemas in one pipeline, you can't ingest two tables of the same name. For example, you can't ingest both
schema1.Marketing
andschema2.Marketing
in the same pipeline. - Multiple table or schema specifications can be included in the
objects
field of theingestion_definition
. However, the source table names in different source schemas can't overlap. This results in ingestion pipeline failure.