This article introduces Delta Sharing in Databricks, the secure data sharing platform that lets you share data in Databricks with users outside your organization.
The Delta Sharing articles on this site focus on sharing Databricks data. Delta Sharing is also available as an open-source project that you can use to share Delta tables from other platforms.
If you are a data recipient who has been granted access to shared data through Delta Sharing, and you just want to learn how to access that data, see Access data shared with you using Delta Sharing.
Delta Sharing is an open protocol developed by Databricks for secure data sharing with other organizations regardless of the computing platforms they use. Databricks builds Delta Sharing into its Unity Catalog data governance platform, enabling a Databricks user, called a data provider, to share data with a person or group outside of their organization, called a data recipient.
Delta Sharing’s native integration with Unity Catalog allows you to manage, govern, audit, and track usage of the shared data on one platform. In fact, your data must be registered in Unity Catalog to be available for secure sharing. Data must also be in the Delta table format.
The primary concepts underlying Delta Sharing in Databricks are shares and recipients.
In Delta Sharing, a share is a read-only collection of tables and table partitions to be shared with one or more recipients.
A share is a securable object registered in Unity Catalog. A share can contain tables from a single Unity Catalog metastore. You can add or remove tables from a share at any time, and you can assign or revoke data recipient access to a share at any time.
If you remove a share from your Unity Catalog metastore, all recipients of that share lose the ability to access it.
A recipient is an object that associates an organization with a credential or secure sharing identifier that allows that organization to access one or more shares.
As a data provider (sharer), you can define multiple recipients for any given Unity Catalog metastore, but if you want to share data from multiple metastores with a particular user or group of users, you must define the recipient separately for each metastore. A recipient can have access to multiple shares.
If you delete a recipient from your Unity Catalog metastore, that recipient loses access to all shares it could previously access.
The way you use Delta Sharing depends on who you are sharing data with:
Open sharing lets you share data with any user, whether or not they have access to Databricks.
Databricks-to-Databricks sharing lets you share data with Databricks users who have access to a Unity Catalog metastore that is different from yours.
If you want to share data with users outside of your Databricks workspace, regardless of whether they use Databricks, you can use open Delta Sharing to share your data securely. As a data provider, you generate a token and share it securely with the recipient. They use the token to authenticate and get read access to the tables you’ve included in the shares you’ve given them access to.
Recipients can access the shared data using many computing tools and platforms, including:
For a full list of Delta Sharing connectors and information about how to use them, see the Delta Sharing documentation.
If you want to share data with users who don’t have access to your Unity Catalog metastore, you can use Databricks-to-Databricks Delta Sharing, as long as the recipients have access to a Databricks workspace that is enabled for Unity Catalog. Databricks-to-Databricks sharing lets you share data with users in other Databricks accounts, whether they’re on AWS or Azure, and it’s a great way to securely share data across different Unity Catalog metastores in your own Databricks account.
The advantage of this scenario is that the share recipient doesn’t need a token to access the share, and the provider doesn’t need to manage recipient tokens. The security of the sharing connection—including all identity verification, authentication, and auditing—is managed entirely through Delta Sharing and the Databricks platform.
Databricks-to-Databricks sharing between Unity Catalog metastores in the same account is always enabled. To enable Delta Sharing to share data with Databricks workspaces in other accounts or non-Databricks clients, a Databricks account admin or metastore admin performs the following setup steps (at a high level):
Enable the External Data Sharing feature group for your Databricks account.
Enable Delta Sharing for the Unity Catalog metastore that manages the data you want to share.
You do not need to enable Delta Sharing on your metastore if you intend to use Delta Sharing to share data only with users on other Unity Catalog metastores in your account. Metastore-to-metastore sharing within a single Databricks account is enabled by default.
Create a share that includes one or more tables in the metastore.
Create a recipient.
If your recipient is not a Databricks user, or does not have access to a Databricks workspace that is enabled for Unity Catalog, you must use open sharing. A set of token-based credentials is generated for that recipient.
If your recipient has access to a Databricks workspace that is enabled for Unity Catalog, you can use Databricks-to-Databricks sharing, and no token-based credentials are required. You request a sharing identifier from the recipient and use it to establish the secure connection.
Use yourself as a test recipient to try out the setup process.
Grant the recipient access to one or more shares.
Send the recipient the information they need to connect to the share.
For open sharing, use a secure channel to send the recipient an activation link that allows them to download their token-based credentials.
For Databricks-to-Databricks sharing, the data included in the share becomes available in the recipient’s Databricks workspace as soon as you grant them access to the share.
The recipient can now access the shared data.
Recipients access the shared data in read-only format. Secure access depends on the sharing model:
Open sharing: The recipient provides the credential whenever they access the data in their tool of choice, including Apache Spark, pandas, Power BI, Databricks, and many more. See Read data shared using Delta Sharing open sharing.
Databricks-to-Databricks: The recipient accesses the data using Databricks. They can use Unity Catalog to grant and deny access to other users in their Databricks account. See Read data shared using Databricks-to-Databricks Delta Sharing.
Whenever the data provider updates data tables in their own Databricks account, the updates appear in near real time in the recipient’s system.
Data providers can use Databricks audit logging to monitor the creation and modification of shares and recipients, and can monitor recipient activity on shares. See Audit and monitor data sharing using Delta Sharing (for providers).
Data recipients who use shared data in a Databricks account can use Databricks audit logging to understand who is accessing which data. See Audit and monitor data access using Delta Sharing (for recipients).
Only tables stored in a Unity Catalog metastore can be shared using Delta Sharing.
Only tables in Delta format are supported. You can easily convert Parquet tables to Delta—and back again. See CONVERT TO DELTA.
Sharing views is not supported in this release.