Configure Amazon RDS and Amazon Aurora MySQL for ingestion
The MySQL connector is in Public Preview. Contact your Databricks account team to request access.
Learn how to configure Amazon RDS for MySQL and Amazon Aurora MySQL for ingestion using Lakeflow Connect. You must enable binary logging and configure binlog retention to support change data capture.
Requirements
- RDS for MySQL: Version 5.7.44 and later (both standalone and HA deployments)
- Aurora MySQL: Version 5.7.mysql_aurora.2.12.2 and later (for HA setups, support is only from primary instance)
- Aurora MySQL Serverless: Supported
- A DB parameter group (for RDS) or a DB cluster parameter group (for Aurora) that you can modify.
Modify the parameter group
Binary logging is controlled through DB parameter groups in RDS and DB cluster parameter groups in Aurora. If you're using the default parameter group, you must create a custom parameter group because default parameter groups can't be modified.
Create a custom parameter group (if needed)
If you're using a default parameter group, create a custom one:
- Sign in to the AWS Management Console and open the RDS console.
- In the navigation pane, choose Parameter groups.
- Click Create parameter group.
- For Parameter group family, select the MySQL version family (for example,
mysql8.0). - For Type, select DB Parameter Group for RDS or DB Cluster Parameter Group for Aurora. For Aurora clusters, you must create a DB Cluster Parameter Group. Associate it at the cluster level, not at the instance level.
- Enter a name and description for the parameter group.
- Click Create.
- Associate the custom parameter group with your RDS instance or Aurora cluster. This requires a database restart.
Configure binlog parameters
Modify the following parameters in your custom parameter group:
Parameter | Value | Description |
|---|---|---|
| This parameter is read-only in RDS. Binary logging is automatically enabled when you enable automated backups. | Enables binary logging. |
|
| Sets the binary log format to row-based replication. Required for change data capture. |
|
| Logs all columns in the binary log, not just changed columns. |
To modify these parameters:
- In the RDS console, select your custom parameter group.
- Click Edit parameters.
- Search for each parameter and set its value as shown in the table.
- Click Save changes.
- Reboot your RDS instance or Aurora cluster for the changes to take effect.
Configure binlog retention
By default, RDS and Aurora purge binary logs as soon as possible to save storage space. You must configure binlog retention to ensure logs are available long enough for the ingestion gateway to process them.
Set the retention period
To set the binlog retention period, connect to your RDS instance or Aurora cluster using a MySQL client with master user credentials, then run the following:
-- Set retention to 7 days (168 hours)
-- For RDS MySQL:
CALL mysql.rds_set_configuration('binlog retention hours', 168);
-- For Aurora MySQL:
CALL mysql.rds_set_configuration('binlog retention hours', 168);
Databricks recommends a binlog retention period of 7 days (168 hours). Setting a lower value might cause binlogs to be cleaned up before the ingestion gateway replays them, which would require a full refresh of all tables.
Verify retention settings
To verify the retention settings:
-- For RDS and Aurora:
CALL mysql.rds_show_configuration;
Example output:
+------------------------+-------+--------------------------------------+
| name | value | description |
+------------------------+-------+--------------------------------------+
| binlog retention hours | 168 | binlog retention hours specifies... |
+------------------------+-------+--------------------------------------+
Enable automated backups
Binary logging in RDS and Aurora is automatically enabled when automated backups are enabled. Verify that automated backups are enabled for your instance:
- In the RDS console, select your DB instance or cluster.
- View the Configuration tab.
- Under Backup, verify that Automated backups is set to Enabled.
- If backups are disabled, click Modify and enable automated backups with a retention period of at least 1 day.
Verify binlog configuration
After modifying the parameter group and rebooting your database, verify that binary logging is properly configured:
-- Check if binary logging is enabled
SHOW VARIABLES LIKE 'log_bin';
-- Should return:
-- +---------------+-------+
-- | Variable_name | Value |
-- +---------------+-------+
-- | log_bin | ON |
-- +---------------+-------+
-- Check binlog format
SHOW VARIABLES LIKE 'binlog_format';
-- Should return:
-- +---------------+-------+
-- | Variable_name | Value |
-- +---------------+-------+
-- | binlog_format | ROW |
-- +---------------+-------+
-- Check binlog row image
SHOW VARIABLES LIKE 'binlog_row_image';
-- Should return:
-- +------------------+-------+
-- | Variable_name | Value |
-- +------------------+-------+
-- | binlog_row_image | FULL |
-- +------------------+-------+
Read replica limitations
- The MySQL connector doesn't support ingesting from Aurora MySQL read replicas. You must connect to the primary instance (writer endpoint) for Aurora deployments.
- Read replicas are supported for RDS if they have backups enabled (so that binary logging is enabled on them).
Network configuration
Ensure that your RDS instance or Aurora cluster is accessible from Databricks:
- Configure security groups to allow inbound traffic on port 3306 (or your custom MySQL port) from Databricks IP ranges.
- If using a private subnet, ensure proper VPC peering or network connectivity.
- Consider using AWS PrivateLink for secure connectivity.
Next steps
An admin can now either:
- Use Catalog Explorer to create a connection so non-admins can create pipelines
- Use the data ingestion UI to create a connection and a pipeline