Skip to main content

Configure Microsoft SQL Server for ingestion into Databricks

Preview

The Microsoft SQL Server connector is in gated Public Preview. To participate in the preview, contact your Databricks account team.

This article provides an overview of the prerequisite source setup decisions and tasks required for ingestion from Microsoft SQL Server (SQL Server) into Databricks using Lakeflow Connect.

Change tracking vs. change data capture

Databricks requires Microsoft change tracking or Microsoft change data capture (CDC) to extract data from SQL Server. If change tracking and CDC are both enabled, the connector uses change tracking.

  • Change tracking captures the fact that rows in a table have changed, but doesn’t capture the actual operations.
  • Change data capture captures every operation on a table.

The following table provides considerations to help you choose between change tracking and CDC:

Change tracking

Change data capture

Captures the fact that rows in a table have changed, but doesn’t capture the actual changes or the number of changes that were made.

Captures the fact that rows in a table were changed and the actual changes.

Low CPU and storage overhead.

High CPU and storage overhead.

If a table has a primary key, Databricks recommends using CT for optimal performance.

When no primary key is present, CDC must be used.

TRUNCATE DDLs are not blocked.

Blocks TRUNCATE DDLs.

Definitive operations are not captured. For example, a delete > insert > update sequence might be recorded as an insert.

Captures each insert, update, and delete operation with the full image.

No native support for DDL. Our DDL support occasionally requires reinitialization of a table.

Limited native support for DDL, but will still require reinitialization of tables in some cases.

For more information about these options, see Track data changes (SQL Server) in the SQL Server documentation.

Overview of source setup tasks

You must complete the following tasks in SQL Server before ingesting data into Databricks: