Prepare MySQL for ingestion using the utility objects script
The MySQL connector is in Public Preview. Contact your Databricks account team to request access.
This page contains references to the term slave, a term that Databricks doesn't use. When the term is removed from the third-party software, we'll remove it from this page.
Complete the MySQL database setup tasks to ingest into Databricks using Lakeflow Connect.
Requirements
- MySQL version: 5.7.44 or 8.0+
- Executing user: Must be root or have full administrative privileges (SUPER, ALL PRIVILEGES)
- CDC user: Must already exist with appropriate authentication plugin before running setup procedures
- Target schemas/tables: Must already exist before granting permissions
Step 1: Install utility objects
This step installs the utility stored procedures needed for MySQL CDC setup.
- Download the script: mysql_setup.sql
- Open the script in MySQL Workbench, DBeaver, or your preferred SQL client.
- Connect to your MySQL instance as root or an admin user.
- Make sure you are connected to your desired schema.
- Run the script.
- Verify installation:
SELECT ROUTINE_NAME
FROM information_schema.ROUTINES
WHERE ROUTINE_SCHEMA = '<schema-name>'
AND ROUTINE_TYPE = 'PROCEDURE'
AND ROUTINE_NAME IN ('lakeflow_cdc_setup', 'lakeflow_setup_cdc_user');
Step 2: Create the CDC user
Before running the setup procedures, create the CDC user with the appropriate authentication plugin.
For MySQL 8.x:
CREATE USER 'cdc_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'your_secure_password';
For MySQL 5.x:
CREATE USER 'cdc_user'@'%' IDENTIFIED WITH sha256_password BY 'your_secure_password';
The authentication plugin must match your MySQL version. Using the wrong plugin will cause the setup procedure to fail.
Step 3: Enable binary logging (for standalone MySQL only)
This step configures binary logging settings required for CDC operations. For details, see lakeflow_setup_cdc_user in the procedure reference below.
CALL lakeflow_cdc_setup();
This procedure only works on standalone MySQL instances. For managed services (Amazon RDS, Azure Database for MySQL, Google Cloud SQL), use the service-specific configuration tools:
- Amazon RDS: Use Parameter Groups to set
binlog_format = ROWandbinlog_row_image = FULL - Azure MySQL: Use Server Parameters
- Google Cloud SQL: Use Database Flags
Step 4: Grant CDC permissions
This step validates the CDC user and grants the necessary permissions for ingestion. For details, see lakeflow_setup_cdc_user in the procedure reference below.
-- Grant permissions on specific tables
CALL lakeflow_setup_cdc_user('cdc_user', '`mydb`.`orders`, `mydb`.`customers`');
Alternative options:
- For all tables in specific schemas:
'mydb1.*,mydb2.*' - For mixed (schemas and specific tables):
'mydb1.*,mydb2.orders' - For replication permissions only (no table access):
'NULL'
Step 5: Verify setup
Run the following queries to confirm that the CDC user has the correct permissions:
-- Check user privileges
SHOW GRANTS FOR 'cdc_user'@'%';
-- Verify binary logging configuration
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';
-- Verify user authentication plugin
SELECT user, host, plugin FROM mysql.user WHERE user = 'cdc_user';
Procedure reference
lakeflow_setup_cdc_user
Validates that a CDC user exists with the appropriate authentication plugin and grants the necessary privileges for CDC operations.
Syntax:
CALL lakeflow_setup_cdc_user(
@User = 'username',
@Tables = 'table_list'
);
Parameters:
Parameter | Type | Description |
|---|---|---|
|
| The CDC username (must already exist) |
|
| Comma-separated list of tables/schemas |
Table list format:
Format | Example | Description |
|---|---|---|
Specific tables |
| Grants SELECT on specific tables |
All tables in schema |
| Grants SELECT on all tables in the schema |
Mixed |
| Combination of schemas and specific tables |
Replication only |
| Grants only replication privileges, no table access |
Privileges granted:
Privilege | Scope | Purpose |
|---|---|---|
| Global ( | Required to read binary logs |
| Global ( | Required for |
|
| Required to read authentication plugin info |
| Specified tables | Required for reading CDC data |
Validation checks:
The procedure performs the following validations before granting privileges:
- MySQL version is 5.x or 8.x
- User exists with
host = '%' - User has correct authentication plugin for the MySQL version
- All specified schemas exist
- All specified tables exist (for non-wildcard entries)
lakeflow_cdc_setup
Configures MySQL binary logging settings required for CDC operations.
Syntax:
CALL lakeflow_cdc_setup();
Parameters: None
Configuration applied:
Setting | Value | Purpose |
|---|---|---|
|
| Captures all column changes in binary log |
|
| Logs complete before/after images |
Behavior by MySQL version:
Version | Behavior |
|---|---|
MySQL 8.x | Uses |
MySQL 5.x | Uses |
Common scenarios
Scenario 1: Specific tables only
-- Create user
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';
-- Setup binary logging
CALL lakeflow_cdc_setup();
-- Grant permissions on specific tables
CALL lakeflow_setup_cdc_user('lakeflow_user', '`sales`.`orders`, `sales`.`customers`, `inventory`.`products`');
Scenario 2: All tables in specific schemas
-- Create user
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';
-- Setup binary logging
CALL lakeflow_cdc_setup();
-- Grant permissions on all tables in schemas
CALL lakeflow_setup_cdc_user('lakeflow_user', '`sales`.*, `inventory`.*, `hr`.*');
Scenario 3: Hybrid approach (schemas + specific tables)
-- Create user
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';
-- Setup binary logging
CALL lakeflow_cdc_setup();
-- Grant all tables in sales schema, but only specific tables in other schemas
CALL lakeflow_setup_cdc_user('lakeflow_user', '`sales`.*, `inventory`.`products`, `hr`.`employees`');
Scenario 4: Amazon RDS MySQL
-- Create user (binary logging configured via Parameter Groups)
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';
-- Grant permissions (skip lakeflow_cdc_setup for managed services)
CALL lakeflow_setup_cdc_user('lakeflow_user', '`mydb`.*');
For Amazon RDS, configure binary logging using a custom Parameter Group with binlog_format = ROW and binlog_row_image = FULL.
MySQL 5.x persistence
When using lakeflow_cdc_setup() on MySQL 5.x, the configuration is not persistent across restarts. To make it permanent:
- Open
/etc/my.cnfwith root privileges:
sudo vi /etc/my.cnf
- Add under
[mysqld]section:
[mysqld]
binlog_format=ROW
binlog_row_image=FULL
- Restart MySQL:
sudo systemctl restart mysqld
Troubleshooting
User does not exist
Error: User 'xxx'@'%' does NOT exist
Solution: Create the user before running the setup procedure. See Step 2.
Wrong authentication plugin
Error: User exists with wrong plugin type. Expected: caching_sha2_password, Found: mysql_native_password
Solution: Recreate the user with the correct authentication plugin:
DROP USER 'cdc_user'@'%';
CREATE USER 'cdc_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'your_password';
Schema does not exist
Error: Catalog 'xxx' does NOT exist
Solution: Create the schema or correct the schema name in your tables list.
Table does not exist
Error: Table 'xxx.yyy' does NOT exist
Solution: Create the table or correct the table name in your tables list.
Managed service detected
Error: Cloud/managed service detected. Use parameter/server group tools for binlog setup.
Solution: Skip lakeflow_cdc_setup() and configure binary logging through your cloud provider's management console.