Skip to main content

Query MySQL with Databricks

This example queries MySQL using its JDBC driver. For more details on reading, writing, configuring parallelism, and query pushdown, see Query databases using JDBC.

Experimental

The legacy query federation documentation has been retired and might not be updated. The configurations mentioned in this content are not officially endorsed or tested by Databricks. If Lakehouse Federation supports your source database, Databricks recommends using that instead.

Using JDBC

Python
driver = "com.mysql.cj.jdbc.Driver"

database_host = "<database-host-url>"
database_port = "3306" # update if you use a non-default port
database_name = "<database-name>"
table = "<table-name>"
user = "<username>"
password = "<password>"

url = f"jdbc:mysql://{database_host}:{database_port}/{database_name}"

remote_table = (spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
)

Using the MySQL connector in Databricks Runtime

Using Databricks Runtime 11.3 LTS and above, you can use the named connector to query MySQL. See the following examples:

Python
remote_table = (spark.read
.format("mysql")
.option("dbtable", "table_name")
.option("host", "database_hostname")
.option("port", "3306") # Optional - will use default port 3306 if not specified.
.option("database", "database_name")
.option("user", "username")
.option("password", "password")
.load()
)