Skip to main content

JDBC connection

note

This feature is in Public Preview for Databricks Runtime 18.1, and DBSQL 2025.40 and above. For SQL warehouses, you must also opt in to the Enable networking for isolated workloads in Serverless SQL Warehouses preview.

Databricks supports connecting to external databases using JDBC. You can use a JDBC Unity Catalog connection to read and write to a data source with the Spark Data Source API or Databricks Remote Query SQL API. The JDBC connection is a securable object in Unity Catalog that specifies the JDBC driver, the URL path, and credentials for accessing an external database. The JDBC connection is supported across Unity Catalog compute types, including serverless, standard clusters, dedicated clusters, and Databricks SQL.

Benefits of using a JDBC connection

  • Read and write to data sources using JDBC with the Spark Data Source API.
  • Read from data sources with JDBC using the Remote Query SQL API.
  • Governed access to the data source using a Unity Catalog connection.
  • Create the connection one time and reuse it across any Unity Catalog compute.
  • Stable for Spark and compute upgrades.
  • Connection credentials are hidden from the querying user.

JDBC versus query federation

JDBC is complementary to query federation. Databricks recommends choosing query federation for the following reasons:

  • Query federation provides fine-grained access controls and governance at table-level using a foreign catalog. JDBC Unity Catalog connection provides governance only at the connection level.
  • Query federation pushes down Spark queries for optimal query performance.
note

Query federation supports many popular databases, including Oracle, MySQL, PostgreSQL, SQL Server, and Snowflake. If your database is supported, Databricks recommends using query federation instead of a JDBC connection. See Lakehouse Federation for the full list of supported databases.

However, choose to use a JDBC Unity Catalog connection in the following scenarios:

  • Your database is not supported by query federation.
  • You want to use a specific JDBC driver.
  • You need to write to the data source using Spark (query federation doesn't support writes).
  • You need more flexibility, performance, and parallelization control through Spark Data Source API options.
  • You want to push down source SQL queries with the Spark query option.

Why use JDBC versus PySpark data sources?

PySpark data sources are an alternative to the JDBC Spark data source.

Use a JDBC connection:

  • If you want to use the built-in Spark JDBC support.
  • If you want to use an out-of-the-box JDBC driver that already exists.
  • If you need Unity Catalog governance at the connection level.
  • If you want to connect from any Unity Catalog compute type: serverless, standard, dedicated, SQL API.
  • If you want to use your connection with Python, Scala, and SQL APIs.

Use a PySpark data source:

  • If you want to have the flexibility to develop and design your Spark data source or data sink using Python.
  • If you only use it in notebooks or PySpark workloads.
  • If you want to implement custom partitioning logic.

Neither JDBC nor PySpark data sources expose statistics to the query optimizer to help select the order of operations.

How it works

To connect to a data source using a JDBC connection, install the JDBC driver on Spark compute. The connection enables you to specify and install the JDBC driver in an isolated sandbox accessible by Spark compute to ensure Spark security and Unity Catalog governance. For more information about sandboxing, see How does Databricks enforce user isolation?.

Before you begin

To use a JDBC connection with the Spark Data Source API on serverless and standard clusters, you must first meet the following requirements:

Workspace requirements:

  • A Databricks workspace enabled for Unity Catalog

Compute requirements:

  • Network connectivity from your compute resource to the target database system. See Network connectivity.
  • Databricks compute must use serverless, or Databricks Runtime 17.3 LTS or above on standard mode or dedicated access mode.
  • SQL warehouses must be pro or serverless and must use 2025.35 or above.

Permissions required:

  • To create a connection, you must have the CREATE CONNECTION privilege on the metastore attached to the workspace.
  • CREATE or MANAGE access to a Unity Catalog volume by the connection creator.
  • Volume access by the user querying the connection.
  • Additional permissions are specified in each task-based section that follows.

Authentication methods

Static Credential

Static credential authentication stores credentials directly on the connection — for example, a username and password, an API key, or any other credential field accepted by the target JDBC driver. The credentials are passed through to the JDBC driver as-is when the connection is used.

OAuth Machine-to-Machine

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

OAuth Machine-to-Machine (M2M) authentication is used when two systems or applications communicate without direct user involvement. Tokens are issued to a registered machine client, that uses its own credentials to authenticate. This authentication method is ideal for service-to-service communication, microservices, and automation tasks in which no user context is needed.

When the JDBC connection uses OAuth M2M, Unity Catalog exchanges the client credentials at the configured token endpoint and passes only the resulting short-lived access token to the JDBC driver using the driver's token parameter.

Step 1: Create a volume and install the JDBC JAR

The JDBC connection reads and installs the JDBC driver JAR from a Unity Catalog volume.

  1. If you don't have write and read access to an existing volume, create a new volume:

    SQL
    CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.my_volume_JARs
  2. Upload the JDBC driver JAR to the volume.

  3. Grant read access on the volume to the users who query the connection:

    SQL
    GRANT READ VOLUME ON VOLUME my_catalog.my_schema.my_volume_JARs TO `account users`

Step 2: Create a JDBC connection

A JDBC connection is a securable object in Unity Catalog. It specifies the JDBC driver, the URL path, credentials for accessing an external database system, and allowlisted options that the querying user can specify. To create a connection, use Catalog Explorer or the CREATE CONNECTION SQL command in a Databricks notebook or the Databricks SQL query editor. See Authentication methods for the supported authentication methods.

note

You can also use the Databricks REST API or the Databricks CLI to create a connection. See POST /api/2.1/unity-catalog/connections and Unity Catalog commands.

Permissions required: Metastore admin or user with the CREATE CONNECTION privilege.

Before you create a connection, note the following:

  • The URL and credentials are the only required options. Don't embed credentials in the URL, because logs or errors can expose them. Use the dedicated credential options for your chosen authentication method.
  • Use externalOptionsAllowList to control which Spark data source options users can specify at query time. If not specified, the default is 'dbtable,query,partitionColumn,lowerBound,upperBound,numPartitions'. Set it to an empty string to restrict users to only the options defined in the connection. Users can never specify url or host.
  1. In your Databricks workspace, click Data icon. Catalog.

  2. Click Plug icon. Connect, then click Connections.

  3. Click Create connection.

  4. On the Connection basics page of the Set up connection wizard, enter a user-friendly Connection name.

  5. For Connection type, select JDBC.

  6. (Optional) Add a comment.

  7. Click Next.

  8. On the Connection details page, enter the following connection properties:

    Property

    Description

    Url

    The JDBC URL for your database, in the form jdbc:subprotocol:subname (for example, jdbc:oracle:thin:@<host>:<port>:<SID>).

    Java dependencies

    The JDBC driver JAR files from Unity Catalog volumes. Click Add JAR Dependency to add each JAR (for example, /Volumes/<catalog>/<schema>/<volume_name>/ojdbc11.jar).

    External options allow list

    Comma-separated list of Spark data source options that querying users can specify at query time. Defaults to dbtable,query,partitionColumn,lowerBound,upperBound,numPartitions. Set to an empty value to restrict users to only the options defined on the connection.

    Additional Options

    Arbitrary JDBC driver options passed through to the driver as key-value pairs. Use this section to set database credentials (for example, key user and key password) and any other driver-specific properties. Switch between UI and JSON input modes as needed.

  9. Click Create connection.

OAuth Machine-to-Machine (Beta)

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

When the jdbc_oauth_m2m_connector preview is enabled in your workspace, the Auth type field appears on the Connection basics page with Static Credential and OAuth Machine to Machine options. To create an OAuth M2M JDBC connection:

  1. On the Connection basics page, set Auth type to OAuth Machine to Machine.

  2. Click Next.

  3. On the Connection details page, enter the following properties in addition to Url and Java dependencies:

    Property

    Description

    Client ID

    The OAuth client ID issued for the application.

    Client secret

    The OAuth client secret issued for the application.

    OAuth scope

    Scope to request during the token exchange. Expressed as a space-delimited list of case-sensitive strings.

    Token endpoint

    The OAuth 2.0 token endpoint used to exchange the client credentials for an access token. Usually in the format https://authorization-server.com/oauth/token.

    OAuth credential exchange method

    How the client credentials are passed to the token endpoint:

    • header_and_body — credentials are sent in both the Authorization header and the request body (default).
    • body_only — credentials are sent in the request body only.
    • header_only — credentials are sent in the Authorization header only.

    JDBC token parameter name

    The property KEY required by the target JDBC driver to accept the OAuth access token. Databricks dynamically populates this parameter VALUE with a generated valid OAuth access token. Typical KEYs: access_token, oauthToken, or password. Refer to your JDBC driver's documentation for the correct parameter KEY name.

  4. Click Create connection.

The connection owner or manager can add to the connection any extra options supported by the JDBC driver. For security reasons, options defined in the connection can't be overridden at query time.

Step 3: Grant the USE privilege

Grant the USE privilege on the connection to the users:

SQL
GRANT USE CONNECTION ON CONNECTION <connection-name> TO <user-name>;

For information about managing existing connections, see Manage connections for Lakehouse Federation.

Step 4: Query the data source

Users with the USE CONNECTION privilege can query the data source using the JDBC connection through Spark or the remote queries SQL API. Users can add any Spark data source options that are supported by the JDBC driver and specified in the externalOptionsAllowList in the JDBC connection (for example, in this case: 'dbtable,query,partitionColumn,lowerBound,upperBound,numPartitions'). To view the allowed options, run the following query:

SQL
DESCRIBE CONNECTION <JDBC-connection-name>;
Python
df = (
spark.read.format('jdbc')
.option('databricks.connection', '<JDBC-connection-name>')
.option('query', 'select * from <table_name>') # query in source SQL language - Option specified by querying user
.load()
)

df.display()

Migration

To migrate from existing Spark Data Source API workloads, Databricks recommends doing the following:

  • Remove the URL and credentials from the options in the Spark Data Source API.
  • Add the databricks.connection in the options in the Spark Data Source API.
  • Create a JDBC connection with the corresponding URL and credentials.
  • In the connection, specify the options that should be static and should not be specified by querying users.
  • In the connection's externalOptionsAllowList, specify the data source options that should be adjusted or modified by the users at query time in the Spark Data Source API code (for example, 'dbtable,query,partitionColumn,lowerBound,upperBound,numPartitions').

Limitations

Spark Data Source API

  • URL and host cannot be included in the Spark Data Source API.
  • .option("databricks.connection", "<Connection_name>") is required.
  • Options defined in the connection cannot be used on the Data Source API in your code at query time.
  • Only the options specified in the externalOptionsAllowList can be used by querying users.
  • The memory limit for the JDBC driver is 400 MiB. Consider using a smaller fetchSize if the limit is reached.

Support

  • Spark data sources are not supported.
  • Lakeflow Spark Declarative Pipelines is not supported.
  • Connection dependency at creation: java_dependencies only support volume locations for JDBC driver JARs.
  • Connection dependency at query: The connection user needs READ access to the volume where the JDBC driver JAR is located.
  • On dedicated access mode (formerly single-user access mode), you must be an owner or a manager of the connection to use it.
  • SSL certificates are not supported.
  • Foreign catalogs are not supported with JDBC connections.

Authentication

  • This connector supports Static Credential and OAuth Machine-to-Machine. It doesn't support Unity Catalog credentials or service credentials.

Networking

  • The target database system and the Databricks workspace cannot be in the same VPC.

Network connectivity

Network connectivity from your compute resource to the target database system is required. See Networking recommendations for Lakehouse Federation for general networking guidance.

Classic compute: standard and dedicated clusters

Databricks VPCs are configured to allow only Spark clusters. To connect to another infrastructure, place the target database system in a different VPC and use VPC peering. After VPC peering is established, check your connectivity with the connectionTest UDF on the cluster or warehouse.

If your Databricks workspace and target database systems are in the same VPC, Databricks recommends one of the following:

  • Use serverless compute.
  • Configure your target database to allow TCP and UDP traffic over ports 80 and 443, and specify these ports in the connection.

Serverless

When using your JDBC connection on serverless compute, configure a firewall for serverless compute access for stable IPs, or configure private connectivity.

Connectivity test

To test the connectivity between the Databricks compute and your database system, use the following UDF:

SQL
CREATE OR REPLACE TEMPORARY FUNCTION connectionTest(host string, port string) RETURNS string LANGUAGE PYTHON AS $$
import subprocess
try:
command = ['nc', '-zv', host, str(port)]
result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
return str(result.returncode) + "|" + result.stdout.decode() + result.stderr.decode()
except Exception as e:
return str(e)
$$;

SELECT connectionTest('<database-host>', '<database-port>');

FAQ

The following frequently asked questions cover predicate pushdown behavior for JDBC connections.

Does JDBC support predicate pushdown?

Yes. Filters are pushed down to the remote database by default for both the Spark Data Source API (format('jdbc')) and the remote_query SQL function. Which predicates can be pushed depends on the JDBC driver and dialect, so run EXPLAIN on your query and inspect the physical plan to confirm which filters are pushed to the source. For the remote_query SQL function, you can control specific pushdowns (filters, limits, offsets, and aggregates) with options such as pushdown.filters.enabled; all are enabled by default.

Predicate pushdown is distinct from exposing table statistics to the query optimizer. JDBC and PySpark data sources don't expose statistics to the query optimizer to help select the order of operations, regardless of whether predicates are pushed down.