Enable built-in CDC in Microsoft SQL Server
The Microsoft SQL Server connector is in Public Preview.
This page provides instructions to enable built-in change data capture (CDC) in Microsoft SQL Server. You must enable either change tracking or CDC to use the SQL Server connector. Databricks recommends using change tracking for any table that has a primary key to minimize the load on the source database. If change tracking and CDC are both enabled, the connector uses change tracking. For guidance on which option to choose, see Change tracking vs. change data capture.
Enable built-in CDC for the source database
-
Log into the database you want to enable for CDC.
-
Run the following T-SQL command in the database context:
Azure SQL Database and on-prem SQL Server
SQLEXEC sys.sp_cdc_enable_db
Amazon RDS for SQL Server
SQLEXEC msdb.dbo.rds_cdc_enable_db '<database-name>'
For more information, see Enable change data capture for a database in the SQL Server documentation.
Enable built-in CDC on the source table
To enable CDC on the source table, run the following stored procedure in Azure SQL. Replace the values for source_schema
, source_name
, and role_name
. @support_net_changes
only supports a value of 1
if the table has a primary key.
- Replace the values for
source_schema
,source_name
, androle_name
. - If the table has a primary key,
@support_net_changes
only supports a value of1
.
EXEC sys.sp_cdc_enable_table
@source_schema = N'MySchema',
@source_name = N'MyTable',
@role_name = NULL,
@supports_net_changes = 1
For more information, see Enable change data capture for a table in the SQL Server documentation.
Grant SELECT
on the CDC schema
In addition to the privileges described in the source setup, the database user needs the SELECT
privilege on the schema cdc
. This schema contains the change tables that are created when CDC is enabled. Run the following T-SQL command:
GRANT SELECT ON SCHEMA::cdc to <database-user>;
Grant VIEW SERVER STATE
(on-prem and RDS)
For on-premises and RDS instances only, the VIEW SERVER STATE
privilege is required to query sys.dm_server_services
, which is used to check the status of SQL Server Agent. Run the following T-SQL command:
GRANT VIEW SERVER STATE to <database-user>
Set up DDL capture and schema evolution
The SQL Server connector can track the data definition language (DDL) on replicated database objects and apply relevant table schema changes to the destination tables or add new tables in case of full schema replication.
DDL capture requires additional setup of database objects, such as internal tables, stored procedures, and triggers. The Transact-SQL (T-SQL) script provided in this article (ddl_support_objects.sql
) drops any pre-existing DDL support objects and creates the DDL support objects required to capture DDL changes happening on the database.
-
Download the ddl_support_objects.sql script.
-
Modify the script to set the
mode
value:BOTH
: Initializes both CT and CDC objects (default)CT
: Initializes CT objectsCDC
: Initializes CDC objectsNONE
: Deletes all pre-existing CT and CDC objects
-
(Recommended) Optionally, modify the script to set the
replicationUser
variable to your SQL Server database user.If
replicationUser
is defined, the script grants all privileges that are required to work with the DDL support objects to the user. Otherwise, you must grant each privilege manually. -
Run the script on each database that you want to ingest.
importantDon't run the script on the master database.
If you run the script in a third-party tool, select the entire script before you run it.
Change data capture (CDC) privilege requirements
If you set the replicationUser
variable in the script, the script grants the required privileges on the DDL support objects to the database user. The required privileges are:
VIEW DEFINITION
on thelakeflowDisableOldCaptureInstance_1_1
objectVIEW DEFINITION
on thelakeflowRefreshCaptureInstance_1_1
objectVIEW DEFINITION
on thelakeflowMergeCaptureInstances_1_1
objectVIEW DEFINITION
on the database that you want to ingestVIEW DATABASE PERFORMANCE STATE
on the database that you want to ingestUPDATE
on thelakeflowCaptureInstanceInfo_1_1
objectEXECUTE
on thedbo
schemaEXECUTE
on thelakeflowMergeCaptureInstances_1_1
objectEXECUTE
on thelakeflowDisableOldCaptureInstance_1_1
objectEXECUTE
on thelakeflowRefreshCaptureInstance_1_1
object
If replicationUser
is unset in the script, you must manually grant the required privileges for CDC. To do this, run the following T-SQL commands, replacing <database-user>
:
GRANT VIEW DEFINITION ON object::dbo.lakeflowDisableOldCaptureInstance_1_1 TO <database-user>;
GRANT VIEW DEFINITION ON object::dbo.lakeflowRefreshCaptureInstance_1_1 TO <database-user>;
GRANT VIEW DEFINITION ON object::dbo.lakeflowMergeCaptureInstances_1_1 TO <database-user>;
GRANT VIEW DEFINITION TO <database-user>;
GRANT VIEW DATABASE PERFORMANCE STATE TO <database-user>;
GRANT UPDATE ON object::dbo.lakeflowCaptureInstanceInfo_1_1 TO <database-user>;
GRANT EXECUTE ON schema::dbo TO <database-user>;
GRANT EXECUTE ON object::dbo.lakeflowMergeCaptureInstances_1_1 TO <database-user>;
GRANT EXECUTE ON object::dbo.lakeflowDisableOldCaptureInstance_1_1 TO <database-user>;
GRANT EXECUTE ON object::dbo.lakeflowRefreshCaptureInstance_1_1 TO <database-user>;