Skip to main content

JDBC Unity Catalog connection

note

This feature is in Beta on Databricks Runtime 17.3 and above.

Databricks supports connecting to external databases using JDBC. You can use a JDBC Unity Catalog connection to read and write to a data source with the Spark Data Source API or Databricks Remote Query SQL API. The JDBC connection is a securable object in Unity Catalog that specifies the JDBC driver, the URL path, and credentials for accessing an external database. The JDBC connection is supported across Unity Catalog compute types, including serverless, standard clusters, dedicated clusters and Databricks SQL.

Benefits of using a JDBC connection

  • Read and write to data sources using JDBC with the Spark Data Source API.
  • Read from data sources with JDBC using the Remote Query SQL API.
  • Governed access to the data source using a Unity Catalog connection.
  • Create the connection one time and reuse it across any Unity Catalog compute.
  • Stable for Spark and compute upgrades.
  • Connection credentials are hidden from the querying user.

JDBC vs. query federation

JDBC is complementary to query federation. Databricks recommends choosing query federation for the following reasons:

  • Query federation provides fine-grained access controls and governance at table-level using a foreign catalog. JDBC Unity Catalog connection provides governance only at the connection level
  • Query federation pushes down Spark queries for optimal query performance

However, choose to use a JDBC Unity Catalog connection in the following scenarios:

  • Your database is not supported by query federation.
  • You want to use a specific JDBC driver.
  • You need to write to the data source using Spark (query federation does not support writes).
  • You need more flexibility, performance, and parallelization control through Spark Data Source API options.
  • You want to push down queries with the Spark query option.

Why use JDBC vs. PySpark data sources?

PySpark data sources are an alternative to the JDBC Spark data source.

Use a JDBC connection:

  • If you want to use the built-in Spark JDBC support.
  • If you want to use an out-of-the-box JDBC driver that already exists.
  • If you need Unity Catalog governance at the connection level.
  • If you want to connect from any Unity Catalog compute type: serverless, standard, dedicated, SQL API.
  • If you want to use your connection with Python, Scala, and SQL APIs.

Use a PySpark Data source:

  • If you want to have the flexibility to develop and design your Spark data source or data sink using Python.
  • If you will only use it in notebooks or PySpark workloads.
  • If you want to implement custom partitioning logic.

Neither JDBC nor PySpark data sources support predicate pushdown. They also do not expose statistics to the query optimizer to help select the order of operations.

How it works

To connect to a data source using a JDBC connection, install the JDBC driver on Spark compute. The connection enables you to specify and install the JDBC driver in an isolated sandbox accessible by Spark compute to ensure Spark security and Unity Catalog governance. For more information about sandboxing, see How does Databricks enforce user isolation?.

Before you begin

To use a JDBC connection with the Spark Data Source API on serverless and standard clusters, you must first meet the following requirements:

Workspace requirements:

  • A Databricks workspace enabled for Unity Catalog

Compute requirements:

  • Network connectivity from your compute resource to the target database system. See Network connectivity.
  • Databricks compute must use serverless, or Databricks Runtime 17.3 LTS or above on standard mode or dedicated access mode.
  • SQL warehouses must be pro or serverless and must use 2025.35 or above.

Permissions required:

  • To create a connection, you must be a metastore admin or a user with the CREATE CONNECTION privilege on the metastore attached to the workspace.
  • CREATE or MANAGE access to a Unity Catalog volume by connection creator
  • Volume access by the user querying the connection
  • Additional permission requirements are specified in each task-based section that follows.

Step 1: Create a volume and install the JDBC JAR

The JDBC connection reads and installs the JDBC driver JAR from a Unity Catalog volume.

  1. If you don't have write and read access to an existing volume, create a new volume:

    SQL
    CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.my_volume_JARs
  2. Upload the JDBC driver JAR to the volume.

  3. Grant read access on the volume to the users who will query the connection:

    SQL
    GRANT READ VOLUME ON VOLUME my_catalog.my_schema.my_volume_JARs TO `account users`

Step 2: Create a JDBC connection

A securable object in Unity Catalog that specifies the JDBC driver, the URL path, and credentials for accessing an external database system and allowlisted options to be used by the querying user. To create a connection, you can use Catalog Explorer or the CREATE CONNECTION SQL command in a Databricks notebook or the Databricks SQL query editor.

note

You can also use the Databricks REST API or the Databricks CLI to create a connection. See POST /api/2.1/unity-catalog/connections and Unity Catalog commands.

Permissions required: Metastore admin or user with the CREATE CONNECTION privilege.

Run the following command in a notebook or the SQL query editor, adjusting the corresponding volume, URL, credentials and externalOptionsAllowList:

SQL
DROP CONNECTION IF EXISTS <JDBC-connection-name>;

CREATE CONNECTION <JDBC-connection-name> TYPE JDBC
ENVIRONMENT (
java_dependencies '["/Volumes/<catalog>/<Schema>/<volume_name>/JDBC_DRIVER_JAR_NAME.jar"]'
)
OPTIONS (
url 'jdbc:<database_URL_host_port>',
user '<user>',
password '<password>',
externalOptionsAllowList 'dbtable,query,partitionColumn,lowerBound,upperBound,numPartitions'
);

DESCRIBE CONNECTION <JDBC-connection-name>;

The connection owner or manager can add to the connection any extra options that are supported by the JDBC driver.

For security reasons, options defined in the connection cannot be overridden at query time. Users can only specify Spark data source options that are not already defined in the connection.

The externalOptionsAllowList enables the connection creator to specify which Spark data source options users can provide at query time. In this example, users can only use: 'dbtable,query,partitionColumn,lowerBound,upperBound,numPartitions'. The externalOptionsAllowList can be an empty string to ensure that only the options specified in the UC Connection are used. URL and host are never allowed to be specified by users.

URL is the only mandatory option when creating the connection. If no allowlist is specified, a default allowlist is used that contains: 'dbtable,query,partitionColumn,lowerBound,upperBound,numPartitions'.

Databricks recommends specifying the credentials in the connection.

Step 3: Grant the USE privilege

Grant the USE privilege on the connection to the users:

SQL
GRANT USE CONNECTION ON CONNECTION <connection-name> TO <user-name>;

For information about managing existing connections, see Manage connections for Lakehouse Federation.

Step 4: Query the data source

Users with the USE CONNECTION privilege can query the data source using the JDBC connection via Spark or the remote queries SQL API. Users can add any Spark data source options which are supported by the JDBC driver and specified in the externalOptionsAllowList in the JDBC connection (for example, in this case: 'dbtable,query,partitionColumn,lowerBound,upperBound,numPartitions'). To view the allowed options, run the following query:

SQL
DESCRIBE CONNECTION <JDBC-connection-name>;
Python
df = spark.read.format('jdbc')
.option('databricks.connection', '<JDBC-connection-name>')
.option('query', 'select * from <table_name>') # query in Database native SQL language - Option specified by querying user
.load()

df.display()

Migration

To migrate from existing Spark Data Source API workloads, Databricks recommends doing the following:

  • Remove the URL and credentials from the options in the Spark Data Source API.
  • Add the databricks.connection in the options in the Spark Data Source API.
  • Create a JDBC connection with the corresponding URL and credentials.
  • In the connection. specify the options that should be static and should not be specified by querying users.
  • In the connection's externalOptionsAllowList, specify the data source options that should be adjusted or modified by the users at query time in the Spark Data Source API code (for example, 'dbtable,query,partitionColumn,lowerBound,upperBound,numPartitions').

Limitations

Spark Data Source API

  • URL and host cannot be included in the Spark Data Source API.
  • .option("databricks.connection", "<Connection_name>") is required.
  • Options defined in the connection cannot be used on the Data Source API in your code at query time.
  • Only the options specified in the externalOptionsAllowList can be used by querying users.

Support

  • Spark data sources are not supported.
  • Spark Declarative Pipelines is not supported.
  • Connection dependency at creation: java_dependencies only support volume locations for JDBC driver JARs.
  • Connection dependency at query: The connection user needs READ access to the volume where the JDBC driver JAR is located.

Authentication

  • Only basic authentication is supported (username and password). There's no support for Unity Catalog credentials, OAuth, or service credentials.

Networking

  • The target database system and the Databricks workspace cannot be in the same VPC.

Network connectivity

Network connectivity from your compute resource to the target database system is required. See Networking recommendations for Lakehouse Federation for general networking guidance.

Classic compute: standard and dedicated clusters

Databricks VPCs are configured to allow only Spark clusters. To connect to another infrastructure, place the target database system in a different VPC and use VPC peering. After VPC peering is established, check your connectivity with the connectionTest UDF on the cluster or warehouse.

If your Databricks workspace and target database systems are in the same VPC, Databricks recommends one of the following:

  • Use serverless compute
  • Configure your target database to allow TCP and UDP traffic over ports 80 and 443, and specify these ports in the connection

Connectivity test

To test the connectivity between the Databricks compute and your database system, use the following UDF:

SQL
CREATE OR REPLACE TEMPORARY FUNCTION connectionTest(host string, port string) RETURNS string LANGUAGE PYTHON AS $$
import subprocess
try:
command = ['nc', '-zv', host, str(port)]
result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
return str(result.returncode) + "|" + result.stdout.decode() + result.stderr.decode()
except Exception as e:
return str(e)
$$;

SELECT connectionTest('<database-host>', '<database-port>');