Connect to data sources and external services

This page provides recommendations for administrators and power users who are configuring connections between Databricks and external data sources and services.

You can connect your Databricks account to data sources such as cloud object storage, relational database management systems, streaming data services, and enterprise platforms such as CRMs. You can also connect your Databricks account to external services, such as AWS Glue or AWS Secrets Manager.

Configure connections to cloud object storage

Most data used by Databricks workloads is stored in cloud object storage, such as AWS S3 or Cloudflare R2. You can manage access to cloud object storage using either of the following:

Unity Catalog (recommended), which provides data governance for both structured and unstructured data in cloud object storage. See Connect to cloud object storage using Unity Catalog.
Legacy connectors and connection patterns. See Configure access to cloud object storage for Databricks using legacy patterns.

Unity Catalog connections

A Unity Catalog connection is a securable object that stores the endpoint and credentials needed to access an external system. Connections provide a governed way to manage authentication and configuration for external data systems, including federation, managed ingestion, JDBC, and HTTP. For an overview of all connection types and how to choose between them, see Unity Catalog connections.

Configure connections to external data systems

Databricks offers several options for configuring connections to external data systems. The following table provides a high-level overview of these options:

Option	Description
Query federation connectors	Query federation provides read-only access to external relational databases by pushing down Unity Catalog queries over JDBC. Supported sources include PostgreSQL, MySQL, SQL Server, Snowflake, and more.
Catalog federation connectors	Catalog federation connects external catalog platforms, such as a Hive Metastore, AWS Glue, or Snowflake Horizon Catalog, so you can query their data directly in file storage without data movement.
Managed ingestion connectors	Lakeflow Connect allows admin users to create a connection and a managed ingestion pipeline at the same time in the data ingestion UI. See Managed connectors in Lakeflow Connect. If the users who will create pipelines are non-admin users or plan to use Databricks APIs, Databricks SDKs, the Databricks CLI, or Declarative Automation Bundles, an admin must first create the connection in Catalog Explorer. These interfaces require that users specify an existing connection when they create a pipeline. See Connect to managed ingestion sources.
Streaming connectors	Databricks provides optimized connectors for many streaming data systems. For all streaming data sources, you must generate credentials that provide access and load these credentials into Databricks. Databricks recommends storing credentials using secrets, because you can use secrets for all configuration options and in all access modes. All data connectors for streaming sources support passing credentials using options when you define streaming queries. See Standard connectors in Lakeflow Connect.
Third-party integrations	Use third-party tools to connect to external data sources and automate ingesting data to the lakehouse. Some solutions also include reverse ETL and direct access to lakehouse data from external systems. See What is Databricks Partner Connect?.
Spark Data Source API	Use the Spark Data Source API to read from and write to external databases. Databricks Runtime includes bundled connectors for common sources. You can also use a Unity Catalog connection with your own JDBC driver JAR, install third-party connectors on dedicated clusters, or build custom connectors with the PySpark DataSource API. See Spark data sources. For read-only access, Databricks recommends Lakehouse Federation.
JDBC	Connect to external databases using JDBC with a Unity Catalog connection for governed access, credential isolation, and cross-compute support. See JDBC connection. For legacy JDBC configurations without Unity Catalog governance, see Query databases using JDBC. For read-only query federation, Lakehouse Federation is always preferred.

Option	Description
Query federation connectors	Query federation provides read-only access to external relational databases by pushing down Unity Catalog queries over JDBC. Supported sources include PostgreSQL, MySQL, SQL Server, Snowflake, and more.
Catalog federation connectors	Catalog federation connects external catalog platforms, such as a Hive Metastore, AWS Glue, or Snowflake Horizon Catalog, so you can query their data directly in file storage without data movement.
Managed ingestion connectors	Lakeflow Connect allows admin users to create a connection and a managed ingestion pipeline at the same time in the data ingestion UI. See Managed connectors in Lakeflow Connect. If the users who will create pipelines are non-admin users or plan to use Databricks APIs, Databricks SDKs, the Databricks CLI, or Declarative Automation Bundles, an admin must first create the connection in Catalog Explorer. These interfaces require that users specify an existing connection when they create a pipeline. See Connect to managed ingestion sources.
Streaming connectors	Databricks provides optimized connectors for many streaming data systems. For all streaming data sources, you must generate credentials that provide access and load these credentials into Databricks. Databricks recommends storing credentials using secrets, because you can use secrets for all configuration options and in all access modes. All data connectors for streaming sources support passing credentials using options when you define streaming queries. See Standard connectors in Lakeflow Connect.
Third-party integrations	Use third-party tools to connect to external data sources and automate ingesting data to the lakehouse. Some solutions also include reverse ETL and direct access to lakehouse data from external systems. See What is Databricks Partner Connect?.
Spark Data Source API	Use the Spark Data Source API to read from and write to external databases. Databricks Runtime includes bundled connectors for common sources. You can also use a Unity Catalog connection with your own JDBC driver JAR, install third-party connectors on dedicated clusters, or build custom connectors with the PySpark DataSource API. See Spark data sources. For read-only access, Databricks recommends Lakehouse Federation.
JDBC	Connect to external databases using JDBC with a Unity Catalog connection for governed access, credential isolation, and cross-compute support. See JDBC connection. For legacy JDBC configurations without Unity Catalog governance, see Query databases using JDBC. For read-only query federation, Lakehouse Federation is always preferred.

Configure connections to non-storage cloud services

Unity Catalog governs access to non-storage cloud services using a securable object called a service credential. A service credential encapsulates a long-term cloud credential that provides access to a non-storage cloud service that users need to connect to from Databricks. See Connect to external cloud services using Unity Catalog.

Manage and request access to data sources and external services

Most connection methods require elevated privileges on both the external data source or service and the Databricks workspace. In typical organizations, few users have sufficient privileges in either Databricks or in external data and storage providers to configure data connections themselves.

Your organization might have already configured access to a data source or service using one of the patterns described in the articles linked from this page. If your organization has a well-defined process for requesting access to data and third-party services, Databricks recommends following that process. If you’re uncertain how to gain access to a data source, this procedure might help:

Use Catalog Explorer to view the tables and volumes that you can access. See What is Catalog Explorer?.
Ask your teammates or managers about the data sources that they can access.
- Most organizations use groups synced from their identity provider (for example: Okta or Microsoft Entra ID) to manage permissions for workspace users. If other members of your team can access data sources that you need access to, have a workspace admin add you to the correct group to grant access.
- If a particular table, volume, or data source was configured by a co-worker, that individual should be able to grant you access to the data.

Some organizations attach data access permissions to specific compute clusters and SQL warehouses. This is a legacy governance model, but if your organization uses it and you want to learn which data sources are available on a specific compute resource, reach out to the compute creator listed on the Compute tab.

Configure connections to cloud object storage​

Unity Catalog connections​

Configure connections to external data systems​

Configure connections to non-storage cloud services​

Manage and request access to data sources and external services​

Configure connections to cloud object storage

Unity Catalog connections

Configure connections to external data systems

Configure connections to non-storage cloud services

Manage and request access to data sources and external services