Configure Azure Database for MySQL for ingestion
The MySQL connector is in Public Preview. Contact your Databricks account team to request access.
Learn how to configure Azure Database for MySQL for ingestion into Databricks. You must enable binary logging and configure binlog retention to support change data capture.
Requirements
- Azure Database for MySQL version 5.7.44 and later, or 8.0 and later (Flexible Server).
- Permission to modify server parameters.
Configure server parameters
Binary logging is controlled through server parameters in Azure Database for MySQL. You must configure these parameters through the Azure portal.
Required server parameters
Configure the following server parameters:
Parameter | Value | Description |
|---|---|---|
|
| Enables binary logging. |
|
| Sets the binary log format to row-based replication. Required for change data capture. |
|
| Logs all columns in the binary log, not just changed columns. |
|
| Specifies how long binary logs are kept before automatic purging. |
Azure portal
To configure server parameters using the Azure portal:
- Sign in to the Azure portal and navigate to your Azure Database for MySQL server.
- In the left menu, under Settings, select Server parameters.
- Search for
binlog_formatand set the value toROW. - Search for
binlog_row_imageand set the value toFULL. - Search for
binlog_expire_logs_secondsand set the value to604800(7 days) or higher. - Click Save at the top of the page.
- When prompted, restart the server for the changes to take effect.
Binlog retention considerations
Databricks recommends a binlog retention period of 7 days (168 hours or 604800 seconds). Setting a lower value might cause binlogs to be cleaned up before the ingestion gateway replays them, which would require a full refresh of all tables.
Verify binlog configuration
After configuring the server parameters and restarting the server, verify that binary logging is properly configured:
-- Check if binary logging is enabled
SHOW VARIABLES LIKE 'log_bin';
-- Should return:
-- +---------------+-------+
-- | Variable_name | Value |
-- +---------------+-------+
-- | log_bin | ON |
-- +---------------+-------+
-- Check binlog format
SHOW VARIABLES LIKE 'binlog_format';
-- Should return:
-- +---------------+-------+
-- | Variable_name | Value |
-- +---------------+-------+
-- | binlog_format | ROW |
-- +---------------+-------+
-- Check binlog row image
SHOW VARIABLES LIKE 'binlog_row_image';
-- Should return:
-- +------------------+-------+
-- | Variable_name | Value |
-- +------------------+-------+
-- | binlog_row_image | FULL |
-- +------------------+-------+
-- Check retention in seconds
SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';
Read replica support
The MySQL connector supports ingesting from read replicas for Azure Database for MySQL. Using a read replica can reduce load on your primary database.
To use a read replica:
- Create a read replica of your primary server in the Azure portal.
- Verify that the read replica has binary logging enabled (inherited from the primary server).
- Use the read replica's connection string when creating the Databricks connection.
There might be a replication lag between the primary server and the replica, which can affect data freshness. Monitor replication lag using Azure Monitor metrics.
Network configuration
Ensure that your Azure Database for MySQL server is accessible from Databricks:
- Configure firewall rules to allow inbound traffic from Databricks IP ranges.
- If using a private endpoint, ensure proper virtual network peering or connectivity.
- Consider using Azure Private Link for secure connectivity.
To add a firewall rule:
- In the Azure portal, navigate to your MySQL server.
- Under Settings, select Networking.
- Add a new firewall rule with the Databricks IP ranges.
- Click Save.
Next steps
An admin can now either:
- Use Catalog Explorer to create a connection so non-admins can create pipelines
- Use the data ingestion UI to create a connection and a pipeline