Share data securely using Delta Sharing
This article introduces Delta Sharing in Databricks, the secure data sharing platform that lets you share data in Databricks with users outside your organization.
The Delta Sharing articles on this site focus on sharing Databricks data and notebooks. Delta Sharing is also available as an open-source project that you can use to share Delta tables from other platforms. Delta Sharing also provides the backbone for Databricks Marketplace, an open forum for exchanging data products.
If you are a data recipient who has been granted access to shared data through Delta Sharing, and you just want to learn how to access that data, see Access data shared with you using Delta Sharing.
What is Delta Sharing?
Delta Sharing is an open protocol developed by Databricks for secure data sharing with other organizations regardless of the computing platforms they use. Databricks builds Delta Sharing into its Unity Catalog data governance platform, enabling a Databricks user, called a data provider, to share data with a person or group outside of their organization, called a data recipient.
Delta Sharing’s native integration with Unity Catalog allows you to manage, govern, audit, and track usage of the shared data on one platform. In fact, your data must be registered in Unity Catalog to be available for secure sharing. Data must also be in the Delta table format.
Shares and recipients
The primary concepts underlying Delta Sharing in Databricks are shares and recipients.
What is a share?
In Delta Sharing, a share is a read-only collection of tables and table partitions to be shared with one or more recipients. If your recipient uses a Unity Catalog-enabled Databricks workspace, you can also include notebook files in a share.
A share is a securable object registered in Unity Catalog. A share can contain tables and notebook files from a single Unity Catalog metastore. You can add or remove tables and notebook files from a share at any time, and you can assign or revoke data recipient access to a share at any time.
If you remove a share from your Unity Catalog metastore, all recipients of that share lose the ability to access it.
What is a recipient?
A recipient is an object that associates an organization with a credential or secure sharing identifier that allows that organization to access one or more shares.
As a data provider (sharer), you can define multiple recipients for any given Unity Catalog metastore, but if you want to share data from multiple metastores with a particular user or group of users, you must define the recipient separately for each metastore. A recipient can have access to multiple shares.
If you delete a recipient from your Unity Catalog metastore, that recipient loses access to all shares it could previously access.
Open sharing versus Databricks-to-Databricks sharing
The way you use Delta Sharing depends on who you are sharing data with:
Open sharing lets you share data with any user, whether or not they have access to Databricks.
Databricks-to-Databricks sharing lets you share data with Databricks users who have access to a Unity Catalog metastore that is different from yours. Databricks-to-Databricks also supports notebook sharing, which is not available in open sharing.
What is open Delta Sharing?
If you want to share data with users outside of your Databricks workspace, regardless of whether they use Databricks, you can use open Delta Sharing to share your data securely. As a data provider, you generate a token and share it securely with the recipient. They use the token to authenticate and get read access to the tables you’ve included in the shares you’ve given them access to.
Recipients can access the shared data using many computing tools and platforms, including:
For a full list of Delta Sharing connectors and information about how to use them, see the Delta Sharing documentation.
See also Share data using the Delta Sharing open sharing protocol.
What is Databricks-to-Databricks Delta Sharing?
If you want to share data with users who don’t have access to your Unity Catalog metastore, you can use Databricks-to-Databricks Delta Sharing, as long as the recipients have access to a Databricks workspace that is enabled for Unity Catalog. Databricks-to-Databricks sharing lets you share data with users in other Databricks accounts, whether they’re on AWS, Azure, or GCP. It’s also a great way to securely share data across different Unity Catalog metastores in your own Databricks account.
One advantage of this scenario is that the share recipient doesn’t need a token to access the share, and the provider doesn’t need to manage recipient tokens. The security of the sharing connection—including all identity verification, authentication, and auditing—is managed entirely through Delta Sharing and the Databricks platform. Another advantage is the ability to share Databricks notebook files.
See also Share data using the Delta Sharing Databricks-to-Databricks protocol.
How do admins set up Delta Sharing?
Databricks-to-Databricks sharing between Unity Catalog metastores in the same account is always enabled. To enable Delta Sharing to share data with Databricks workspaces in other accounts or non-Databricks clients, a Databricks account admin or metastore admin performs the following setup steps (at a high level):
Enable the External Data Sharing feature group for your Databricks account.
Enable Delta Sharing for the Unity Catalog metastore that manages the data you want to share.
You do not need to enable Delta Sharing on your metastore if you intend to use Delta Sharing to share data only with users on other Unity Catalog metastores in your account. Metastore-to-metastore sharing within a single Databricks account is enabled by default.
Create a share that includes one or more tables in the metastore.
If you plan to use Databricks-to-Databricks sharing, you can also add notebook files to a share.
Create a recipient.
See Create and manage data recipients for Delta Sharing.
If your recipient is not a Databricks user, or does not have access to a Databricks workspace that is enabled for Unity Catalog, you must use open sharing. A set of token-based credentials is generated for that recipient.
If your recipient has access to a Databricks workspace that is enabled for Unity Catalog, you can use Databricks-to-Databricks sharing, and no token-based credentials are required. You request a sharing identifier from the recipient and use it to establish the secure connection.
Use yourself as a test recipient to try out the setup process.
Grant the recipient access to one or more shares.
See Grant and manage access to Delta Sharing data shares.
This step can also be performed by a non-admin user with the
SET SHARE PERMISSIONprivileges. See Unity Catalog privileges and securable objects.
Send the recipient the information they need to connect to the share.
See Send the recipient their connection information.
For open sharing, use a secure channel to send the recipient an activation link that allows them to download their token-based credentials.
For Databricks-to-Databricks sharing, the data included in the share becomes available in the recipient’s Databricks workspace as soon as you grant them access to the share.
The recipient can now access the shared data.
How do recipients access the shared data?
Recipients access shared tables in read-only format. Shared notebook files are read-only, but they can be cloned and then modified and run in the recipient workspace just like any other notebook.
Secure access depends on the sharing model:
Open sharing: The recipient provides the credential whenever they access the data in their tool of choice, including Apache Spark, pandas, Power BI, Databricks, and many more. See Read data shared using Delta Sharing open sharing.
Databricks-to-Databricks: The recipient accesses the data using Databricks. They can use Unity Catalog to grant and deny access to other users in their Databricks account. See Read data shared using Databricks-to-Databricks Delta Sharing.
Whenever the data provider updates data tables in their own Databricks account, the updates appear in near real time in the recipient’s system.
How do you keep track of who is sharing and accessing shared data?
Data providers can use Databricks audit logging to monitor the creation and modification of shares and recipients, and can monitor recipient activity on shares. See Audit and monitor data sharing using Delta Sharing (for providers).
Data recipients who use shared data in a Databricks account can use Databricks audit logging to understand who is accessing which data. See Audit and monitor data access using Delta Sharing (for recipients).
You can use Delta Sharing to share notebook files using the Databricks-to-Databricks sharing flow. See Add notebook files to a share (for providers) and Read shared notebooks (for recipients).
Delta Sharing and streaming
Delta Sharing supports Spark Structured Streaming. A provider can share a table with history so that a recipient can use it as a Structured Streaming source, processing shared data incrementally with low latency. Recipients can also perform Delta Lake time travel queries on tables shared with history.
To learn how to share tables with history, see Add tables to a share. To learn how to use shared tables as streaming sources, see Query a table using Apache Spark Structured Streaming (for recipients of Databricks-to-Databricks sharing) or Access a shared table using Spark Structured Streaming (for recipients of open sharing data).
See also Streaming on Databricks.
Only tables stored in a Unity Catalog metastore can be shared using Delta Sharing.
Only tables in Delta format are supported. You can easily convert Parquet tables to Delta—and back again. See CONVERT TO DELTA.
Sharing views is not supported in this release.
There are limits on the number of files in metadata allowed for a shared table. To learn more, see Resource limit exceeded errors.
The values below indicate the quotas for Delta Sharing resources.
If you expect to exceed these resource limits, contact your Databricks account representative.
Learn more about the open sharing and Databricks-to-Databricks sharing models