Skip to main content

Configure Azure Database for MySQL for ingestion

Preview

The MySQL connector is in Public Preview. Contact your Databricks account team to request access.

Learn how to configure Azure Database for MySQL for ingestion into Databricks. You must enable binary logging and configure binlog retention to support change data capture.

Requirements

  • Azure Database for MySQL version 5.7.44 and later, or 8.0 and later (Flexible Server).
  • Permission to modify server parameters.

Configure server parameters

Binary logging is controlled through server parameters in Azure Database for MySQL. You must configure these parameters through the Azure portal.

Required server parameters

Configure the following server parameters:

Parameter

Value

Description

log_bin

ON

Enables binary logging. log_bin is enabled by default in Azure MySQL Flexible Server and is not configurable. If this option is available, set it to ON.

binlog_format

ROW

Sets the binary log format to row-based replication. Required for change data capture.

binlog_row_image

FULL

Logs all columns in the binary log, not just changed columns.

binlog_expire_logs_seconds

604800 (7 days)

Specifies how long binary logs are kept before automatic purging.

Azure portal

To configure server parameters using the Azure portal:

  1. Sign in to the Azure portal and navigate to your Azure Database for MySQL server.
  2. In the left menu, under Settings, select Server parameters.
  3. Search for binlog_format and set the value to ROW.
  4. Search for binlog_row_image and set the value to FULL.
  5. Search for binlog_expire_logs_seconds and set the value to 604800 (7 days) or higher.
  6. Click Save at the top of the page.
  7. When prompted, restart the server for the changes to take effect.

Binlog retention considerations

Databricks recommends a binlog retention period of 7 days (168 hours or 604800 seconds). Setting a lower value might cause binlogs to be cleaned up before the ingestion gateway replays them, which would require a full refresh of all tables.

Verify binlog configuration

After configuring the server parameters and restarting the server, verify that binary logging is properly configured:

SQL
-- Check if binary logging is enabled
SHOW VARIABLES LIKE 'log_bin';

-- Should return:
-- +---------------+-------+
-- | Variable_name | Value |
-- +---------------+-------+
-- | log_bin | ON |
-- +---------------+-------+

-- Check binlog format
SHOW VARIABLES LIKE 'binlog_format';

-- Should return:
-- +---------------+-------+
-- | Variable_name | Value |
-- +---------------+-------+
-- | binlog_format | ROW |
-- +---------------+-------+

-- Check binlog row image
SHOW VARIABLES LIKE 'binlog_row_image';

-- Should return:
-- +------------------+-------+
-- | Variable_name | Value |
-- +------------------+-------+
-- | binlog_row_image | FULL |
-- +------------------+-------+

-- Check retention in seconds
SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';

Read replica support

The MySQL connector supports ingesting from read replicas for Azure Database for MySQL. Using a read replica can reduce load on your primary database.

To use a read replica:

  1. Create a read replica of your primary server in the Azure portal.
  2. Verify that the read replica has binary logging enabled (inherited from the primary server).
  3. Use the read replica's connection string when creating the Databricks connection.
note

There might be a replication lag between the primary server and the replica, which can affect data freshness. Monitor replication lag using Azure Monitor metrics.

Network configuration

Ensure that your Azure Database for MySQL server is accessible from Databricks:

  • Configure firewall rules to allow inbound traffic from Databricks IP ranges.
  • If using a private endpoint, ensure proper virtual network peering or connectivity.
  • Consider using Azure Private Link for secure connectivity.

To add a firewall rule:

  1. In the Azure portal, navigate to your MySQL server.
  2. Under Settings, select Networking.
  3. Add a new firewall rule with the Databricks IP ranges.
  4. Click Save.

Next steps

An admin can now either: