Skip to main content

MySQL connector limitations

Preview

The MySQL connector is in Public Preview. Contact your Databricks account team to request access.

Learn about limitations and considerations for the MySQL connector.

General database connector limitations

The limitations in this section apply to all database connectors in Lakeflow Connect. Keep reading for connector-specific limitations.

  • When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
  • When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Lakeflow Spark Declarative Pipelines behavior.
  • During source maintenance periods, Databricks might not be able to access your data.
  • If a source table name conflicts with an existing destination table name, the pipeline update fails.
  • Multi-destination pipeline support is API-only.
  • You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
  • The source system assumes that the cursor columns are monotonically increasing.

MySQL versions and variations

The connector supports the following MySQL versions and platforms:

  • Amazon RDS for MySQL: Version 5.7.44 and later (both standalone and HA deployments)
  • Amazon Aurora MySQL: Version 5.7.mysql_aurora.2.12.2 and later (primary instance only for HA setups)
  • Amazon Aurora MySQL Serverless: Supported
  • Azure Database for MySQL Flexible Servers: Version 5.7.44 and later (both standalone and HA deployments)
  • Google Cloud SQL for MySQL: Version 5.7.44 and later, Version 8.0 and later
  • MySQL on EC2: Version 5.7.44 and later

The connector doesn't support MariaDB or other MySQL-compatible databases.

Authentication

  • The connector supports username and password authentication with the following authentication plugins:

    • MySQL 5.7.44: The replication user must be created using the sha256_password authentication plugin.
    • MySQL 8.0.x and later: Both sha256_password and caching_sha2_password plugins are supported.
  • The Test Connection button in the UI might fail for users using sha256_password or caching_sha2_password, even when credentials are correct. This is a known issue. You can still create the connection and proceed with pipeline setup.

Binary log requirements

  • Binary logging must be enabled with binlog_format=ROW and binlog_row_image=FULL.
  • If the binlog is purged before the gateway processes changes, you must perform a full refresh of all tables in the pipeline.
  • Databricks recommends 7 days of binlog retention. Setting a lower value might cause binlogs to be cleaned up before the ingestion gateway replays them.

Read replica support

  • Read replica support is only available for Amazon RDS for MySQL, Azure Database for MySQL, and MySQL on EC2.
  • The connector doesn't support ingesting from Aurora MySQL read replicas. You must connect to the Aurora primary instance (writer endpoint).
  • When you use read replicas, replication lag can affect data freshness.

Pipelines

  • Each ingestion pipeline must be associated with exactly one ingestion gateway.
  • Although the ingestion pipeline runs on serverless compute, the ingestion gateway must run on classic compute.
  • The ingestion gateway runs continuously to capture changes before binlogs can be truncated or purged.

Schema evolution and DDL operations

The connector automatically handles new and deleted columns.

  • When a new non-spatial column appears in the source, Databricks automatically ingests it on the next run of the pipeline.
  • Spatial data type columns are not supported and can't be ingested.
  • When a column is deleted from the source, Databricks doesn't delete it automatically. Instead, the connector uses a table property to set the deleted column to inactive in the destination. If another column later appears that has a conflicting name with the inactive column, the pipeline fails. In this case, you can run a full refresh of the table or manually drop the inactive column.

This is true for new and deleted tables within a schema if you ingest the entire schema.

DDL operation handling

The connector can handle some DDL operations, but many require a full refresh of the table:

  • TRUNCATE TABLE: You must refresh the table.
  • RENAME TABLE: The renamed table is skipped in the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
  • DROP TABLE: The table is removed from the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
  • ADD PRIMARY KEY or UNIQUE constraint: You must refresh the table.
  • DROP PRIMARY KEY constraint: You must refresh the table.
  • DROP UNIQUE constraint: If this is the only unique constraint on the source table, you must refresh the table. If not, no action is needed.
  • DROP COLUMN / MODIFY COLUMN / CHANGE COLUMN: You must refresh the table.
  • ADD COLUMN:
    • If the column has a non-spatial type: No action is needed. The column is automatically ingested on the next pipeline update.
    • If the column has a spatial type: Adding the column pauses further ingestion of that table. The table is skipped in the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.
  • ADD TABLE (when ingesting entire schema):
    • If the table has no spatial types: No action is needed. The table is automatically ingested on the next pipeline update.
    • If the table has spatial types: The table is skipped in the ingestion gateway, with an error in the event log. The flow for this table is marked as failed in the ingestion pipeline.

When to perform a full refresh

Perform a full refresh of a table in the following scenarios:

  • After a TRUNCATE TABLE operation on the source.
  • After a column data type change (MODIFY COLUMN).
  • After a column rename (CHANGE COLUMN).
  • After a DROP COLUMN operation (to remove inactive columns).
  • When spatial columns are added (before removing them from selection).

To perform a full refresh, see Fully refresh target tables.

Staging

The staging catalog can't be a foreign catalog.

Tables and data types

  • Databricks recommends ingesting 250 or fewer tables per pipeline. However, there is no limit on the number of rows or columns that are supported within these objects.
  • MySQL is a two-level namespace. The source_schema and source_table names are case-sensitive.
  • Although you can ingest from multiple source schemas in one pipeline, you can't ingest two tables of the same name to the same destination schema. For example, you can't ingest both schema1.customers and schema2.customers to the same destination schema. However, you can ingest them to different destination schemas using a multi-destination pipeline.
  • Spatial data types (GEOMETRY, POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION) are not supported. If spatial columns are present in selected tables, the table is skipped in the ingestion gateway, with an error in the event log.

Character sets and collation

The connector currently replicates only UTF-8 compatible byte sequences. This means replication works correctly when the byte representation stored in MySQL matches UTF-8 encoding.

Supported character sets

The following MySQL character sets are fully or partially supported:

  • Fully supported (bytes always match UTF-8):

    • utf8
    • utf8mb4
    • ascii
  • Partially supported (ASCII range only):

    • latin1 - Most real-world data stored as latin1 is ASCII-compatible and works correctly. MySQL 5.7.44 uses latin1 as the default character set.
  • Default character sets by version:

    • MySQL 5.7.44: latin1 (default)
    • MySQL 8.0 and later: utf8mb4 (default)

Character set limitations

  • The connector doesn't convert between character sets. It transfers raw bytes and expects them to represent valid UTF-8.
  • Non-UTF-8 compatible characters in unsupported character sets might appear garbled or incorrect in the destination.
  • For non-ASCII data in character sets other than those listed above, consider converting your tables to utf8mb4 before ingestion.

Known issues

  • The Test Connection button fails for MySQL users with sha256_password or caching_sha2_password authentication, even when credentials are valid. This is a known limitation. You can still create connections and pipelines.