Access Databricks data using external systems
This article provides an overview of functionality and recommendations for making data managed and governed by Databricks available to other systems.
These patterns focus on scenarios where your organization needs to integrate trusted tools or systems to Databricks data. If you are looking for guidance on sharing data outside of your organization, see Share data and AI assets securely with users in other organizations.
What external access does Databricks support?
Databricks recommends using Unity Catalog to govern all your data assets.
The following table provides an overview of support formats and access patterns for Unity Catalog objects.
Unity Catalog object |
Formats supported |
Access patterns |
---|---|---|
Managed tables |
Delta Lake, Iceberg |
Credential vending, Iceberg REST catalog, Delta Sharing |
External tables |
Delta Lake |
Credential vending, Iceberg REST catalog, Delta Sharing, cloud URIs |
External tables |
CSV, JSON, Avro, Parquet, ORC, text |
Cloud URIs |
External volumes |
All data types |
Cloud URIs |
Note
Iceberg support describes tables written by Databricks using Delta Lake but with Iceberg reads (UniForm) enabled.
For more details on these Unity Catalog objects, see the following:
Unity Catalog credential vending
Unity Catalog credential vending allows users to configure external clients to inherit privileges on data governed by Databricks. See Unity Catalog credential vending for external system access.
Read tables with Iceberg clients
Databricks provides Iceberg clients with read-only support for tables registered to Unity Catalog. Supported clients include Apache Spark, Apache Flink, Trino, and Snowflake. See Read Databricks tables from Iceberg clients.
Read and write external Delta tables
You can access Unity Catalog external tables backed by Delta Lake from external Delta Lake reader and writer clients using cloud object storage URIs and credentials.
Unity Catalog does not govern reads and writes performed directly against cloud object storage from external systems, so you must configure additional policies and credentials in your cloud account to ensure that data governance policies are respected outside Databricks.
To avoid potential data corruption and data loss issues, Databricks recommends you do not modify the same Delta table stored in S3 from different workspaces or clients.
You can use Cloudflare R2 for cloud object storage if you require writes from multiple clients. See Create a storage credential for connecting to Cloudflare R2.
Note
The Databricks documentation lists limitations and compatibility considerations based on Databricks Runtime versions and platform features. You must confirm what reader and writer protocols and table features your client supports. See delta.io.
Access non-Delta Lake tabular data with external tables
Unity Catalog external tables support many formats other than Delta Lake, including Parquet, ORC, CSV, and JSON. External tables store all data files in directories in a cloud object storage location specified by a cloud URI provided during table creation. Other systems access these data files directly from cloud object storage.
Unity Catalog does not govern reads and writes performed directly against cloud object storage from external systems, so you must configure additional policies and credentials in your cloud account to ensure that data governance policies are respected outside Databricks.
Reading and writing to external tables from multiple systems can lead to consistency issues and data corruption because no transactional guarantees are provided for formats other than Delta Lake.
Unity Catalog might not pick up new partitions written to external tables backed by formats other than Delta Lake. Databricks recommends regularly running MSCK REPAIR TABLE table_name
to ensure Unity Catalog has registered all data files written by external systems.
Access non-tabular data with external volumes
Databricks recommends using external volumes to store non-tabular data files that are read or written by external systems in addition to Databricks. See What are Unity Catalog volumes?.
Unity Catalog does not govern reads and writes performed directly against cloud object storage from external systems, so you must configure additional policies and credentials in your cloud account to ensure that data governance policies are respected outside Databricks.
Volumes provides APIs, SDKs, and other tools for getting files from and putting files into volumes. See Manage files in volumes.
Note
Delta Sharing allows you to share volumes to other Databricks accounts, but does not integrate with external systems.