Skip to main content

MySQL ingestion connector

Preview

The MySQL connector is in Public Preview. Contact your Databricks account team to request access.

This page helps you understand the MySQL ingestion workflow, including the factors that determine your setup approach and the steps involved for different user personas.

What to know before you start

Topic

Why it matters

Databricks user persona

The workflow depends on your Databricks user persona:

  • Single-user: An admin user configures the source database and creates a Unity Catalog connection, an ingestion gateway, and an ingestion pipeline.
  • Multi-user: An admin user configures the source database and creates a connection for non-admin users to create gateway-pipeline pairs with.

Deployment environment

The source database configuration depends on the MySQL deployment environment (Amazon RDS, Aurora MySQL, Azure Database for MySQL, Google Cloud SQL for MySQL, or MySQL on EC2).

Authentication method

The steps to create a connection depend on the authentication method you choose.

Interface

The steps to create a connection, a gateway, and a pipeline depend on the interface.

Ingestion frequency

The pipeline schedule depends on your latency and cost requirements.

Common patterns

Depending on your ingestion needs, the pipeline might use configurations like history tracking, column selection, and row filtering. Supported configurations vary by connector. See Feature availability.

Start ingesting from MySQL

The following table provides an overview of the end-to-end MySQL ingestion workflow, based on user type:

User

Steps

Admin

  1. Configure MySQL for ingestion into Databricks.
  2. Either:
    • Use Catalog Explorer to create a connection so that non-admins can create gateway-pipeline pairs. See MySQL.
    • Use the data ingestion UI to create a connection, a gateway, and a pipeline. See Create a MySQL ingestion pipeline.

Non-admin

Use any supported interface to create a gateway and a pipeline. See Create a MySQL ingestion pipeline.

Feature availability

Feature

Availability

UI-based pipeline authoring

Green check icon Supported

API-based pipeline authoring

Green check icon Supported

Declarative Automation Bundles

Green check icon Supported

Incremental ingestion

Green check icon Supported

Unity Catalog governance

Green check icon Supported

Orchestration using Databricks Workflows

Green check icon Supported

SCD type 2

Green check icon Supported

API-based column selection and deselection

Green check icon Supported

API-based row filtering

Red X icon Not supported

Automated schema evolution: New and deleted columns

Green check icon Supported

Automated schema evolution: Data type changes

Red X icon Not supported

Automated schema evolution: Column renames

Green check icon Supported

Treated as a new column (new name) and deleted column (old name).

Automated schema evolution: New tables

Green check icon Supported

If you ingest the entire schema. See the limitations on the number of tables per pipeline.

Maximum number of tables per pipeline

250

Authentication methods

Authentication method

Availability

OAuth U2M

Red X icon Not supported

OAuth M2M

Red X icon Not supported

OAuth (manual refresh token)

Red X icon Not supported

Basic authentication (username/password)

Green check icon Supported

Basic authentication (API key)

Red X icon Not supported

Basic authentication (service account JSON key)

Red X icon Not supported