MySQL connector limitations

Preview

The MySQL connector is in Public Preview. Contact your Databricks account team to request access.

Learn about limitations and considerations for the MySQL connector.

General database connector limitations

The limitations in this section apply to all database connectors in Lakeflow Connect. Keep reading for connector-specific limitations.

When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Lakeflow Spark Declarative Pipelines behavior.
During source maintenance periods, Databricks might not be able to access your data.
If a source table name conflicts with an existing destination table name, the pipeline update fails.
Multi-destination pipeline support is API-only.
The source system assumes that the cursor columns are monotonically increasing.

With SCD type 1 enabled, deletes don't produce an explicit delete event in the change data feed. For auditable deletions, use SCD type 2 if the connector supports it. For details, see Example: SCD type 1 and SCD type 2 processing with CDF source data.

MySQL versions and variations

The connector supports the following MySQL versions and platforms:

Amazon RDS for MySQL: Version 5.7.44 and later (both standalone and HA deployments)
Amazon Aurora MySQL: Version 5.7.mysql_aurora.2.12.2 and later (primary instance only for HA setups)
Amazon Aurora MySQL Serverless: Supported
Azure Database for MySQL Flexible Servers: Version 5.7.44 and later (both standalone and HA deployments)
Google Cloud SQL for MySQL: Version 5.7.44 and later, Version 8.0 and later
MySQL on EC2: Version 5.7.44 and later

The connector doesn't support MariaDB or other MySQL-compatible databases.

Authentication

The connector supports username and password authentication with the following authentication plugins:
- MySQL 5.7.44: The replication user must be created using the sha256_password authentication plugin.
- MySQL 8.0.x and later: Both sha256_password and caching_sha2_password plugins are supported.
The Test Connection button in the UI might fail for users using sha256_password or caching_sha2_password, even when credentials are correct. This is a known issue. You can still create the connection and proceed with pipeline setup.

Binary log requirements

Binary logging must be enabled with binlog_format=ROW and binlog_row_image=FULL.
If the binlog is purged before the gateway processes changes, you must perform a full refresh of all tables in the pipeline.
Databricks recommends 7 days of binlog retention. Setting a lower value might cause binlogs to be cleaned up before the ingestion gateway replays them.

Read replica support

Read replica support is only available for Amazon RDS for MySQL, Azure Database for MySQL, and MySQL on EC2.
The connector doesn't support ingesting from Aurora MySQL read replicas. You must connect to the Aurora primary instance (writer endpoint).
When you use read replicas, replication lag can affect data freshness.

Pipelines

Each ingestion pipeline must be associated with exactly one ingestion gateway.
Although the ingestion pipeline runs on serverless compute, the ingestion gateway must run on classic compute.
The ingestion gateway runs continuously to capture changes before binlogs can be truncated or purged.

Schema evolution and DDL operations

The connector automatically handles new and deleted columns.

When a new non-spatial column appears in the source, Databricks automatically ingests it on the next run of the pipeline.
Spatial data type columns are not supported and can't be ingested.
When a column is deleted from the source, Databricks doesn't delete it automatically. Instead, the connector uses a table property to set the deleted column to inactive in the destination. If another column later appears that has a conflicting name with the inactive column, the pipeline fails. In this case, you can run a full refresh of the table or manually drop the inactive column.

This is true for new and deleted tables within a schema if you ingest the entire schema.

DDL operation handling

The connector can handle some DDL operations, but many require a full refresh of the table:

TRUNCATE TABLE: You must refresh the table.
RENAME TABLE: The renamed table is skipped in the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
DROP TABLE: The table is removed from the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
ADD PRIMARY KEY or UNIQUE constraint: You must refresh the table.
DROP PRIMARY KEY constraint: You must refresh the table.
DROP UNIQUE constraint: If this is the only unique constraint on the source table, you must refresh the table. If not, no action is needed.
DROP COLUMN / MODIFY COLUMN / CHANGE COLUMN: You must refresh the table.
ADD COLUMN:
- If the column has a non-spatial type: No action is needed. The column is automatically ingested on the next pipeline update.
- If the column has a spatial type: Adding the column pauses further ingestion of that table. The table is skipped in the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
ADD TABLE (when ingesting entire schema):
- If the table has no spatial types: No action is needed. The table is automatically ingested on the next pipeline update.
- If the table has spatial types: The table is skipped in the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.

When to perform a full refresh

Perform a full refresh of a table in the following scenarios:

After a TRUNCATE TABLE operation on the source.
After a column data type change (MODIFY COLUMN).
After a column rename (CHANGE COLUMN).
After a DROP COLUMN operation (to remove inactive columns).
When spatial columns are added (before removing them from selection).

To perform a full refresh, see Fully refresh target tables.

Staging

The staging catalog can't be a foreign catalog.

Tables and data types

Databricks recommends ingesting 250 or fewer tables per pipeline. However, there is no limit on the number of rows or columns that are supported within these objects.
MySQL is a two-level namespace. The source_schema and source_table names are case-sensitive.
Although you can ingest from multiple source schemas in one pipeline, you can't ingest two tables of the same name to the same destination schema. For example, you can't ingest both schema1.customers and schema2.customers to the same destination schema. However, you can ingest them to different destination schemas using a multi-destination pipeline.
Spatial data types (GEOMETRY, POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION) are not supported. If spatial columns are present in selected tables, the table is skipped in the ingestion gateway, with an error in the event log.

Character sets and collation

The connector currently replicates only UTF-8 compatible byte sequences. This means replication works correctly when the byte representation stored in MySQL matches UTF-8 encoding.

Supported character sets

The following MySQL character sets are fully or partially supported:

Fully supported (bytes always match UTF-8):
- utf8
- utf8mb4
- ascii
Partially supported (ASCII range only):
- latin1 - Most real-world data stored as latin1 is ASCII-compatible and works correctly. MySQL 5.7.44 uses latin1 as the default character set.
Default character sets by version:
- MySQL 5.7.44: latin1 (default)
- MySQL 8.0 and later: utf8mb4 (default)

Character set limitations

The connector doesn't convert between character sets. It transfers raw bytes and expects them to represent valid UTF-8.
Non-UTF-8 compatible characters in unsupported character sets might appear garbled or incorrect in the destination.
For non-ASCII data in character sets other than those listed above, consider converting your tables to utf8mb4 before ingestion.

Known issues

The Test Connection button fails for MySQL users with sha256_password or caching_sha2_password authentication, even when credentials are valid. This is a known limitation. You can still create connections and pipelines.

General database connector limitations​

MySQL versions and variations​

Authentication​

Binary log requirements​

Read replica support​

Pipelines​

Schema evolution and DDL operations​

DDL operation handling​

When to perform a full refresh​

Staging​

Tables and data types​

Character sets and collation​

Supported character sets​

Character set limitations​

Known issues​