Connect to data sources and external services
This page provides recommendations for administrators and power users who are configuring connections between Databricks and external data sources and services.
You can connect your Databricks account to data sources such as cloud object storage, relational database management systems, streaming data services, and enterprise platforms such as CRMs. You can also connect your Databricks account to external services, such as AWS Glue or AWS Secrets Manager.
Configure connections to object storage
Most data used by Databricks workloads is stored in cloud object storage, such as AWS S3 or Cloudflare R2. You can manage access to cloud object storage using either of the following:
-
Unity Catalog (recommended), which provides data governance for both structured and unstructured data in cloud object storage. See Connect to cloud object storage using Unity Catalog.
-
Legacy connectors and connection patterns. See Configure access to cloud object storage for Databricks using legacy patterns.
Configure connections to external data systems
Databricks offers several options for configuring connections to external data systems. The following table provides a high-level overview of these options:
Option | Description |
---|---|
Query federation connectors | Lakehouse Federation provides read-only access to data in enterprise data systems. Connections are configured through Unity Catalog at the catalog or schema level, syncing multiple tables with a single configuration. See What is Lakehouse Federation?. |
Managed ingestion connectors | Lakeflow Connect allows admin users to create a connection and a managed ingestion pipeline at the same time in the data ingestion UI. See Managed connectors in Lakeflow Connect. If the users who will create pipelines are non-admin users or plan to use Databricks APIs, Databricks SDKs, the Databricks CLI, or Databricks Asset Bundles, an admin must first create the connection in Catalog Explorer. These interfaces require that users specify an existing connection when they create a pipeline. See Connect to managed ingestion sources. |
Streaming connectors | Databricks provides optimized connectors for many streaming data systems. For all streaming data sources, you must generate credentials that provide access and load these credentials into Databricks. Databricks recommends storing credentials using secrets, because you can use secrets for all configuration options and in all access modes. All data connectors for streaming sources support passing credentials using options when you define streaming queries. See Standard connectors in Lakeflow Connect. |
Third-party integrations | Use third-party tools to connect to external data sources and automate ingesting data to the lakehouse. Some solutions also include reverse ETL and direct access to lakehouse data from external systems. See What is Databricks Partner Connect?. |
Drivers | Databricks includes drivers for external data systems in each Databricks Runtime. You can optionally install third-party drivers to access data in other systems. You must configure connections for each table. Some drivers include write access. See Connect to external systems. Lakehouse Federation is always preferred to using these drivers for read-only query federation. |
JDBC | Several included drivers for external systems build upon native JDBC support, and the JDBC option provides extensible options for configuring connections to other systems. You must configure connections for each table. See Query databases using JDBC. Lakehouse Federation is always preferred to using these drivers for read-only query federation. |
Configure connections to external services
Unity Catalog governs access to non-storage services using a securable object called a service credential. A service credential encapsulates a long-term cloud credential that provides access to an external service that users need to connect to from Databricks. See Connect to external cloud services using Unity Catalog
Manage and request access to data sources and external services
Most connection methods require elevated privileges on both the external data source or service and the Databricks workspace. In typical organizations, few users have sufficient privileges in either Databricks or in external data and storage providers to configure data connections themselves.
Your organization might have already configured access to a data source or service using one of the patterns described in the articles linked from this page. If your organization has a well-defined process for requesting access to data and third-party services, Databricks recommends following that process. If you’re uncertain how to gain access to a data source, this procedure might help:
- Use Catalog Explorer to view the tables and volumes that you can access. See What is Catalog Explorer?.
- Ask your teammates or managers about the data sources that they can access.
- Most organizations use groups synced from their identity provider (for example: Okta or Microsoft Entra ID) to manage permissions for workspace users. If other members of your team can access data sources that you need access to, have a workspace admin add you to the correct group to grant access.
- If a particular table, volume, or data source was configured by a co-worker, that individual should be able to grant you access to the data.
Some organizations attach data access permissions to specific compute clusters and SQL warehouses. This is a legacy governance model, but if your organization uses it and you want to learn which data sources are available on a specific compute resource, reach out to the compute creator listed on the Compute tab.