Skip to main content

Prepare MySQL for ingestion using the utility objects script

Preview

The MySQL connector is in Public Preview. Contact your Databricks account team to request access.

info

This page contains references to the term slave, a term that Databricks doesn't use. When the term is removed from the third-party software, we'll remove it from this page.

Complete the MySQL database setup tasks to ingest into Databricks using Lakeflow Connect.

Requirements

  • MySQL version: 5.7.44 or 8.0+
  • Executing user: Must be root or have full administrative privileges (SUPER, ALL PRIVILEGES)
  • CDC user: Must already exist with appropriate authentication plugin before running setup procedures
  • Target schemas/tables: Must already exist before granting permissions

Step 1: Install utility objects

This step installs the utility stored procedures needed for MySQL CDC setup.

  1. Download the script: mysql_setup.sql
  2. Open the script in MySQL Workbench, DBeaver, or your preferred SQL client.
  3. Connect to your MySQL instance as root or an admin user.
  4. Make sure you are connected to your desired schema.
  5. Run the script.
  6. Verify installation:
SQL
SELECT ROUTINE_NAME
FROM information_schema.ROUTINES
WHERE ROUTINE_SCHEMA = '<schema-name>'
AND ROUTINE_TYPE = 'PROCEDURE'
AND ROUTINE_NAME IN ('lakeflow_cdc_setup', 'lakeflow_setup_cdc_user');

Step 2: Create the CDC user

Before running the setup procedures, create the CDC user with the appropriate authentication plugin.

For MySQL 8.x:

SQL
CREATE USER 'cdc_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'your_secure_password';

For MySQL 5.x:

SQL
CREATE USER 'cdc_user'@'%' IDENTIFIED WITH sha256_password BY 'your_secure_password';
note

The authentication plugin must match your MySQL version. Using the wrong plugin will cause the setup procedure to fail.

Step 3: Enable binary logging (for standalone MySQL only)

This step configures binary logging settings required for CDC operations. For details, see lakeflow_setup_cdc_user in the procedure reference below.

SQL
CALL lakeflow_cdc_setup();
note

This procedure only works on standalone MySQL instances. For managed services (Amazon RDS, Azure Database for MySQL, Google Cloud SQL), use the service-specific configuration tools:

  • Amazon RDS: Use Parameter Groups to set binlog_format = ROW and binlog_row_image = FULL
  • Azure MySQL: Use Server Parameters
  • Google Cloud SQL: Use Database Flags

Step 4: Grant CDC permissions

This step validates the CDC user and grants the necessary permissions for ingestion. For details, see lakeflow_setup_cdc_user in the procedure reference below.

SQL
-- Grant permissions on specific tables
CALL lakeflow_setup_cdc_user('cdc_user', '`mydb`.`orders`, `mydb`.`customers`');

Alternative options:

  • For all tables in specific schemas: 'mydb1.*, mydb2.*'
  • For mixed (schemas and specific tables): 'mydb1.*, mydb2.orders'
  • For replication permissions only (no table access): 'NULL'

Step 5: Verify setup

Run the following queries to confirm that the CDC user has the correct permissions:

SQL
-- Check user privileges
SHOW GRANTS FOR 'cdc_user'@'%';

-- Verify binary logging configuration
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';

-- Verify user authentication plugin
SELECT user, host, plugin FROM mysql.user WHERE user = 'cdc_user';

Procedure reference

lakeflow_setup_cdc_user

Validates that a CDC user exists with the appropriate authentication plugin and grants the necessary privileges for CDC operations.

Syntax:

SQL
CALL lakeflow_setup_cdc_user(
@User = 'username',
@Tables = 'table_list'
);

Parameters:

Parameter

Type

Description

@User

VARCHAR(255)

The CDC username (must already exist)

@Tables

MEDIUMTEXT

Comma-separated list of tables/schemas

Table list format:

Format

Example

Description

Specific tables

`schema`.`table1`, `schema`.`table2`

Grants SELECT on specific tables

All tables in schema

`schema`.*

Grants SELECT on all tables in the schema

Mixed

`schema1`.*, `schema2`.`table1`

Combination of schemas and specific tables

Replication only

'NULL'

Grants only replication privileges, no table access

Privileges granted:

Privilege

Scope

Purpose

REPLICATION SLAVE

Global (*.*)

Required to read binary logs

REPLICATION CLIENT

Global (*.*)

Required for SHOW MASTER STATUS

SELECT (plugin, user)

mysql.user

Required to read authentication plugin info

SELECT

Specified tables

Required for reading CDC data

Validation checks:

The procedure performs the following validations before granting privileges:

  1. MySQL version is 5.x or 8.x
  2. User exists with host = '%'
  3. User has correct authentication plugin for the MySQL version
  4. All specified schemas exist
  5. All specified tables exist (for non-wildcard entries)

lakeflow_cdc_setup

Configures MySQL binary logging settings required for CDC operations.

Syntax:

SQL
CALL lakeflow_cdc_setup();

Parameters: None

Configuration applied:

Setting

Value

Purpose

binlog_format

ROW

Captures all column changes in binary log

binlog_row_image

FULL

Logs complete before/after images

Behavior by MySQL version:

Version

Behavior

MySQL 8.x

Uses SET PERSIST (survives restarts)

MySQL 5.x

Uses SET GLOBAL (requires manual persistence)

Common scenarios

Scenario 1: Specific tables only

SQL
-- Create user
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';

-- Setup binary logging
CALL lakeflow_cdc_setup();

-- Grant permissions on specific tables
CALL lakeflow_setup_cdc_user('lakeflow_user', '`sales`.`orders`, `sales`.`customers`, `inventory`.`products`');

Scenario 2: All tables in specific schemas

SQL
-- Create user
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';

-- Setup binary logging
CALL lakeflow_cdc_setup();

-- Grant permissions on all tables in schemas
CALL lakeflow_setup_cdc_user('lakeflow_user', '`sales`.*, `inventory`.*, `hr`.*');

Scenario 3: Hybrid approach (schemas + specific tables)

SQL
-- Create user
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';

-- Setup binary logging
CALL lakeflow_cdc_setup();

-- Grant all tables in sales schema, but only specific tables in other schemas
CALL lakeflow_setup_cdc_user('lakeflow_user', '`sales`.*, `inventory`.`products`, `hr`.`employees`');

Scenario 4: Amazon RDS MySQL

SQL
-- Create user (binary logging configured via Parameter Groups)
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';

-- Grant permissions (skip lakeflow_cdc_setup for managed services)
CALL lakeflow_setup_cdc_user('lakeflow_user', '`mydb`.*');
tip

For Amazon RDS, configure binary logging using a custom Parameter Group with binlog_format = ROW and binlog_row_image = FULL.

MySQL 5.x persistence

When using lakeflow_cdc_setup() on MySQL 5.x, the configuration is not persistent across restarts. To make it permanent:

  1. Open /etc/my.cnf with root privileges:
Bash
sudo vi /etc/my.cnf
  1. Add under [mysqld] section:
ini
[mysqld]
binlog_format=ROW
binlog_row_image=FULL
  1. Restart MySQL:
Bash
sudo systemctl restart mysqld

Troubleshooting

User does not exist

Error: User 'xxx'@'%' does NOT exist

Solution: Create the user before running the setup procedure. See Step 2.

Wrong authentication plugin

Error: User exists with wrong plugin type. Expected: caching_sha2_password, Found: mysql_native_password

Solution: Recreate the user with the correct authentication plugin:

SQL
DROP USER 'cdc_user'@'%';
CREATE USER 'cdc_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'your_password';

Schema does not exist

Error: Catalog 'xxx' does NOT exist

Solution: Create the schema or correct the schema name in your tables list.

Table does not exist

Error: Table 'xxx.yyy' does NOT exist

Solution: Create the table or correct the table name in your tables list.

Managed service detected

Error: Cloud/managed service detected. Use parameter/server group tools for binlog setup.

Solution: Skip lakeflow_cdc_setup() and configure binary logging through your cloud provider's management console.

Additional resources