Skip to main content

Enable built-in CDC in Microsoft SQL Server

Preview

The Microsoft SQL Server connector is in Public Preview.

This page provides instructions to enable built-in change data capture (CDC) in Microsoft SQL Server. You must enable either change tracking or CDC to use the SQL Server connector. Databricks recommends using change tracking for any table that has a primary key to minimize the load on the source database. If change tracking and CDC are both enabled, the connector uses change tracking. For guidance on which option to choose, see Change tracking vs. change data capture.

Enable built-in CDC for the source database

  1. Log into the database you want to enable for CDC.

  2. Run the following T-SQL command in the database context:

    Azure SQL Database and on-prem SQL Server

    SQL
    EXEC sys.sp_cdc_enable_db

    Amazon RDS for SQL Server

    SQL
    EXEC msdb.dbo.rds_cdc_enable_db '<database-name>'

For more information, see Enable change data capture for a database in the SQL Server documentation.

Enable built-in CDC on the source table

To enable CDC on the source table, run the following stored procedure in Azure SQL. Replace the values for source_schema, source_name, and role_name. @support_net_changes only supports a value of 1 if the table has a primary key.

  • Replace the values for source_schema, source_name, and role_name.
  • If the table has a primary key, @support_net_changes only supports a value of 1.
SQL
EXEC sys.sp_cdc_enable_table
@source_schema = N'MySchema',
@source_name = N'MyTable',
@role_name = NULL,
@supports_net_changes = 1

For more information, see Enable change data capture for a table in the SQL Server documentation.

Grant SELECT on the CDC schema

In addition to the privileges described in the source setup, the database user needs the SELECT privilege on the schema cdc. This schema contains the change tables that are created when CDC is enabled. Run the following T-SQL command:

SQL
GRANT SELECT ON SCHEMA::cdc to <database-user>;

Grant VIEW SERVER STATE (on-prem and RDS)

For on-premises and RDS instances only, the VIEW SERVER STATE privilege is required to query sys.dm_server_services, which is used to check the status of SQL Server Agent. Run the following T-SQL command:

SQL
GRANT VIEW SERVER STATE to <database-user>

Set up DDL capture and schema evolution

The SQL Server connector can track the data definition language (DDL) on replicated database objects and apply relevant table schema changes to the destination tables or add new tables in case of full schema replication.

DDL capture requires additional setup of database objects, such as internal tables, stored procedures, and triggers. The Transact-SQL (T-SQL) script provided in this article (ddl_support_objects.sql) drops any pre-existing DDL support objects and creates the DDL support objects required to capture DDL changes happening on the database.

  1. Download the ddl_support_objects.sql script.

  2. Modify the script to set the mode value:

    • BOTH: Initializes both CT and CDC objects (default)
    • CT: Initializes CT objects
    • CDC: Initializes CDC objects
    • NONE: Deletes all pre-existing CT and CDC objects
  3. (Recommended) Optionally, modify the script to set the replicationUser variable to your SQL Server database user.

    If replicationUser is defined, the script grants all privileges that are required to work with the DDL support objects to the user. Otherwise, you must grant each privilege manually.

  4. Run the script on each database that you want to ingest.

    important

    Don't run the script on the master database.

    If you run the script in a third-party tool, select the entire script before you run it.

Change data capture (CDC) privilege requirements

If you set the replicationUser variable in the script, the script grants the required privileges on the DDL support objects to the database user. The required privileges are:

  • VIEW DEFINITION on the lakeflowDisableOldCaptureInstance_1_1 object
  • VIEW DEFINITION on the lakeflowRefreshCaptureInstance_1_1 object
  • VIEW DEFINITION on the lakeflowMergeCaptureInstances_1_1 object
  • VIEW DEFINITION on the database that you want to ingest
  • VIEW DATABASE PERFORMANCE STATE on the database that you want to ingest
  • UPDATE on the lakeflowCaptureInstanceInfo_1_1 object
  • EXECUTE on the dbo schema
  • EXECUTE on the lakeflowMergeCaptureInstances_1_1 object
  • EXECUTE on the lakeflowDisableOldCaptureInstance_1_1 object
  • EXECUTE on the lakeflowRefreshCaptureInstance_1_1 object

If replicationUser is unset in the script, you must manually grant the required privileges for CDC. To do this, run the following T-SQL commands, replacing <database-user>:

SQL
GRANT VIEW DEFINITION ON object::dbo.lakeflowDisableOldCaptureInstance_1_1 TO <database-user>;
GRANT VIEW DEFINITION ON object::dbo.lakeflowRefreshCaptureInstance_1_1 TO <database-user>;
GRANT VIEW DEFINITION ON object::dbo.lakeflowMergeCaptureInstances_1_1 TO <database-user>;
GRANT VIEW DEFINITION TO <database-user>;
GRANT VIEW DATABASE PERFORMANCE STATE TO <database-user>;
GRANT UPDATE ON object::dbo.lakeflowCaptureInstanceInfo_1_1 TO <database-user>;
GRANT EXECUTE ON schema::dbo TO <database-user>;
GRANT EXECUTE ON object::dbo.lakeflowMergeCaptureInstances_1_1 TO <database-user>;
GRANT EXECUTE ON object::dbo.lakeflowDisableOldCaptureInstance_1_1 TO <database-user>;
GRANT EXECUTE ON object::dbo.lakeflowRefreshCaptureInstance_1_1 TO <database-user>;

Next step

Create an ingestion pipeline