Configure Google Cloud SQL for MySQL for ingestion
The MySQL connector is in Public Preview. Contact your Databricks account team to request access.
This article describes how to configure Google Cloud SQL for MySQL for ingestion. You must enable binary logging and configure binlog retention to support change data capture.
Requirements
- Cloud SQL for MySQL version 5.7.44 and later, or 8.0 and later.
- A Cloud SQL instance.
Configure database flags
Binary logging is controlled through database flags in Cloud SQL for MySQL. You must configure these flags through the Google Cloud Console.
Required database flags
Configure the following database flags:
Flag | Value | Description |
|---|---|---|
|
| Enables binary logging. |
|
| Sets the binary log format to row-based replication. Required for change data capture. |
|
| Logs all columns in the binary log, not just changed columns. |
|
| Specifies how long binary logs are kept before automatic purging. For MySQL 5.7, use |
Google Cloud Console
To configure database flags using the Google Cloud Console:
- Open the Google Cloud Console and navigate to the Cloud SQL Instances page.
- Select your Cloud SQL instance to open the instance details page.
- Click Edit at the top of the page.
- Scroll down to the Flags section.
- Click Add flag and select
binlog_format. Set the value toROW. - Click Add flag and select
binlog_row_image. Set the value toFULL. - For MySQL 8.0, click Add flag and select
binlog_expire_logs_seconds. Set the value to604800or higher. - For MySQL 5.7.44 and later, click Add flag and select
expire_logs_days. Set the value to7or higher. - Click Save at the bottom of the page.
- When prompted, confirm that you want to restart the instance for the changes to take effect.
Binlog retention considerations
Databricks recommends a binlog retention period of 7 days (168 hours or 604800 seconds). Setting a lower value might cause binlogs to be cleaned up before the ingestion gateway replays them, which would require a full refresh of all tables.
For MySQL 5.7.44 and later, use the expire_logs_days flag (integer value in days).
For MySQL 8.0 and later, use the binlog_expire_logs_seconds flag (integer value in seconds).
Verify the binlog configuration
After configuring the database flags and restarting the instance, verify that binary logging is properly configured. Connect to your Cloud SQL instance using a MySQL client and run:
-- Check if binary logging is enabled
SHOW VARIABLES LIKE 'log_bin';
-- Should return:
-- +---------------+-------+
-- | Variable_name | Value |
-- +---------------+-------+
-- | log_bin | ON |
-- +---------------+-------+
-- Check binlog format
SHOW VARIABLES LIKE 'binlog_format';
-- Should return:
-- +---------------+-------+
-- | Variable_name | Value |
-- +---------------+-------+
-- | binlog_format | ROW |
-- +---------------+-------+
-- Check binlog row image
SHOW VARIABLES LIKE 'binlog_row_image';
-- Should return:
-- +------------------+-------+
-- | Variable_name | Value |
-- +------------------+-------+
-- | binlog_row_image | FULL |
-- +------------------+-------+
-- For MySQL 8.0: Check retention in seconds
SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';
-- For MySQL 5.7: Check retention in days
SHOW VARIABLES LIKE 'expire_logs_days';
Network configuration
Ensure that your Cloud SQL instance is accessible from Databricks:
- Public IP: Add authorized networks to allow connections from Databricks IP ranges.
- Private IP: Configure VPC peering between your Google Cloud VPC and Databricks.
- Cloud SQL Proxy: Consider using Cloud SQL Auth Proxy for secure connections.
Add authorized networks (public IP)
To add authorized networks for public IP connectivity:
- In the Cloud Console, navigate to your Cloud SQL instance.
- Select Connections from the left menu.
- Under Authorized networks, click Add network.
- Enter the Databricks IP ranges.
- Click Done and then Save.
Automated backups
Cloud SQL automatically enables binary logging when automated backups are enabled. Verify that automated backups are enabled:
- In the Cloud Console, navigate to your Cloud SQL instance.
- Select Backups from the left menu.
- Verify that Automated backups is enabled.
- If disabled, click Edit configuration and enable automated backups.
Next steps
An admin can now either:
- Use Catalog Explorer to create a connection so non-admins can create pipelines
- Use the data ingestion UI to create a connection and a pipeline