MySQL connector limitations
The MySQL connector is in Public Preview. Contact your Databricks account team to request access.
Learn about limitations and considerations for the MySQL connector.
General database connector limitations
The limitations in this section apply to all database connectors in Lakeflow Connect. Keep reading for connector-specific limitations.
- When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
- When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Lakeflow Spark Declarative Pipelines behavior.
- During source maintenance periods, Databricks might not be able to access your data.
- If a source table name conflicts with an existing destination table name, the pipeline update fails.
- Multi-destination pipeline support is API-only.
- You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
- The source system assumes that the cursor columns are monotonically increasing.
- With SCD type 1 enabled, deletes don't produce an explicit
deleteevent in the change data feed. For auditable deletions, use SCD type 2 if the connector supports it. For details, see Example: SCD type 1 and SCD type 2 processing with CDF source data.
MySQL versions and variations
The connector supports the following MySQL versions and platforms:
- Amazon RDS for MySQL: Version 5.7.44 and later (both standalone and HA deployments)
- Amazon Aurora MySQL: Version 5.7.mysql_aurora.2.12.2 and later (primary instance only for HA setups)
- Amazon Aurora MySQL Serverless: Supported
- Azure Database for MySQL Flexible Servers: Version 5.7.44 and later (both standalone and HA deployments)
- Google Cloud SQL for MySQL: Version 5.7.44 and later, Version 8.0 and later
- MySQL on EC2: Version 5.7.44 and later
The connector doesn't support MariaDB or other MySQL-compatible databases.
Authentication
-
The connector supports username and password authentication with the following authentication plugins:
- MySQL 5.7.44: The replication user must be created using the
sha256_passwordauthentication plugin. - MySQL 8.0.x and later: Both
sha256_passwordandcaching_sha2_passwordplugins are supported.
- MySQL 5.7.44: The replication user must be created using the
-
The Test Connection button in the UI might fail for users using
sha256_passwordorcaching_sha2_password, even when credentials are correct. This is a known issue. You can still create the connection and proceed with pipeline setup.
Binary log requirements
- Binary logging must be enabled with
binlog_format=ROWandbinlog_row_image=FULL. - If the binlog is purged before the gateway processes changes, you must perform a full refresh of all tables in the pipeline.
- Databricks recommends 7 days of binlog retention. Setting a lower value might cause binlogs to be cleaned up before the ingestion gateway replays them.
Read replica support
- Read replica support is only available for Amazon RDS for MySQL, Azure Database for MySQL, and MySQL on EC2.
- The connector doesn't support ingesting from Aurora MySQL read replicas. You must connect to the Aurora primary instance (writer endpoint).
- When you use read replicas, replication lag can affect data freshness.
Pipelines
- Each ingestion pipeline must be associated with exactly one ingestion gateway.
- Although the ingestion pipeline runs on serverless compute, the ingestion gateway must run on classic compute.
- The ingestion gateway runs continuously to capture changes before binlogs can be truncated or purged.
Schema evolution and DDL operations
The connector automatically handles new and deleted columns.
- When a new non-spatial column appears in the source, Databricks automatically ingests it on the next run of the pipeline.
- Spatial data type columns are not supported and can't be ingested.
- When a column is deleted from the source, Databricks doesn't delete it automatically. Instead, the connector uses a table property to set the deleted column to
inactivein the destination. If another column later appears that has a conflicting name with theinactivecolumn, the pipeline fails. In this case, you can run a full refresh of the table or manually drop the inactive column.
This is true for new and deleted tables within a schema if you ingest the entire schema.
DDL operation handling
The connector can handle some DDL operations, but many require a full refresh of the table:
- TRUNCATE TABLE: You must refresh the table.
- RENAME TABLE: The renamed table is skipped in the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
- DROP TABLE: The table is removed from the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
- ADD PRIMARY KEY or UNIQUE constraint: You must refresh the table.
- DROP PRIMARY KEY constraint: You must refresh the table.
- DROP UNIQUE constraint: If this is the only unique constraint on the source table, you must refresh the table. If not, no action is needed.
- DROP COLUMN / MODIFY COLUMN / CHANGE COLUMN: You must refresh the table.
- ADD COLUMN:
- If the column has a non-spatial type: No action is needed. The column is automatically ingested on the next pipeline update.
- If the column has a spatial type: Adding the column pauses further ingestion of that table. The table is skipped in the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
- ADD TABLE (when ingesting entire schema):
- If the table has no spatial types: No action is needed. The table is automatically ingested on the next pipeline update.
- If the table has spatial types: The table is skipped in the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
When to perform a full refresh
Perform a full refresh of a table in the following scenarios:
- After a
TRUNCATE TABLEoperation on the source. - After a column data type change (
MODIFY COLUMN). - After a column rename (
CHANGE COLUMN). - After a
DROP COLUMNoperation (to remove inactive columns). - When spatial columns are added (before removing them from selection).
To perform a full refresh, see Fully refresh target tables.
Staging
The staging catalog can't be a foreign catalog.
Tables and data types
- Databricks recommends ingesting 250 or fewer tables per pipeline. However, there is no limit on the number of rows or columns that are supported within these objects.
- MySQL is a two-level namespace. The
source_schemaandsource_tablenames are case-sensitive. - Although you can ingest from multiple source schemas in one pipeline, you can't ingest two tables of the same name to the same destination schema. For example, you can't ingest both
schema1.customersandschema2.customersto the same destination schema. However, you can ingest them to different destination schemas using a multi-destination pipeline. - Spatial data types (GEOMETRY, POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION) are not supported. If spatial columns are present in selected tables, the table is skipped in the ingestion gateway, with an error in the event log.
Character sets and collation
The connector currently replicates only UTF-8 compatible byte sequences. This means replication works correctly when the byte representation stored in MySQL matches UTF-8 encoding.
Supported character sets
The following MySQL character sets are fully or partially supported:
-
Fully supported (bytes always match UTF-8):
utf8utf8mb4ascii
-
Partially supported (ASCII range only):
latin1- Most real-world data stored aslatin1is ASCII-compatible and works correctly. MySQL 5.7.44 useslatin1as the default character set.
-
Default character sets by version:
- MySQL 5.7.44:
latin1(default) - MySQL 8.0 and later:
utf8mb4(default)
- MySQL 5.7.44:
Character set limitations
- The connector doesn't convert between character sets. It transfers raw bytes and expects them to represent valid UTF-8.
- Non-UTF-8 compatible characters in unsupported character sets might appear garbled or incorrect in the destination.
- For non-ASCII data in character sets other than those listed above, consider converting your tables to
utf8mb4before ingestion.
Known issues
- The Test Connection button fails for MySQL users with
sha256_passwordorcaching_sha2_passwordauthentication, even when credentials are valid. This is a known limitation. You can still create connections and pipelines.