Access data shared with you using Delta Sharing (for recipients)
This article shows how to to access data that has been shared with you using Delta Sharing.
Delta Sharing and data recipients
Delta Sharing is an open standard for secure data sharing. A Databricks user, called a data provider, can use Delta Sharing to share data with a person or group outside of their organization, called a data recipient.
Databricks-to-Databricks sharing and open sharing
How you access the data depends on whether you yourself are a Databricks user and whether or not your data provider configured the data being shared with you for Databricks-to-Databricks sharing or open sharing.
In the Databricks-to-Databricks model, you must be a user on a Databricks workspace that is enabled for Unity Catalog. A member of your team provides the data provider with a unique identifier for your Unity Catalog metastore, and the data provider uses that to create a secure sharing connection. The shared data becomes available for access in your workspace. If necessary, a member of your team configures granular access control on that data.
In the open sharing model, you can use any tool you like (including Databricks) to access the shared data. The data provider sends you an activation URL over a secure channel. You follow it to download a credential file that lets you access the data shared with you.
Terms of use
The shared data is not provided by Databricks directly but by data providers running on Databricks.
Note
By accessing a data provider’s shared data as a data recipient, data recipient represents that it has been authorized to access the data share(s) provided to it by the data provider and acknowledges that (1) Databricks has no liability for such data or data recipient’s use of such shared data, and (2) Databricks may collect information about data recipient’s use of and access to the shared data (including identifying any individual or company who accesses the data using the credential file in connection with such information) and may share it with the applicable data provider.
Get access to the data shared with you
How you access the data depends on whether your data provider shared data with you using the open sharing protocol or the Databricks-to-Databricks sharing protocol. See Databricks-to-Databricks sharing and open sharing.
Get access in the Databricks-to-Databricks model
In the Databricks-to-Databricks model:
The data provider sends you instructions for finding a unique identifier for the Unity Catalog metastore associated with your Databricks workspace, and you send it to them.
The sharing identifier is a string consisting of the metastore’s cloud, region, and UUID (the unique identifier for the metastore), in the format
<cloud>:<region>:<uuid>
. For example,aws:eu-west-1:b0c978c8-3e68-4cdf-94af-d05c120ed1ef
.To get the sharing identifier using Catalog Explorer:
In your Databricks workspace, click Catalog.
At the top of the Catalog pane, click the gear icon and select Delta Sharing.
Alternatively, from the Quick access page, click the Delta Sharing > button.
On the Shared with me tab, click your Databricks sharing organization name in the upper right, and select Copy sharing identifier.
To get the sharing identifier using a notebook or Databricks SQL query, use the default SQL function
CURRENT_METASTORE
. If you use a notebook, it must run on a shared or single-user cluster in the workspace you will use to access the shared data.SELECT CURRENT_METASTORE();
The data provider creates:
A recipient in their Databricks account to represent you and the users in your organization who will access the data.
A share, which is a representation of the tables, volumes, and views to be shared with you.
You access the data shared with you. You or someone on your team can, if necessary, configure granular data access on that data for your users. See Read data shared using Databricks-to-Databricks Delta Sharing (for recipients).
Get access in the open sharing model
In the open sharing model:
The data provider creates:
A recipient in their Databricks account to represent you and the users in your organization who will access the data. A token and credential file are generated as part of this configuration.
A share, which is a representation of the tables and partitions to be shared with you.
The data provider sends you an activation URL over a secure channel. You follow it to download a credential file that lets you access the data shared with you.
Important
Don’t share the activation link with anyone. You can download a credential file only once. If you visit the activation link again after the credential file has already downloaded, the Download Credential File button is disabled.
If you lose the activation link before you use it, contact the data provider.
Store the credential file in a secure location.
Don’t share the credential file with anyone outside the group of users who should have access to the shared data. If you need to share it with someone in your organization, Databricks recommends using a password manager.
Read the shared data
How you read data that has been shared securely with you using Delta Sharing depends on whether you received a credential file (the open sharing model) or you are using a Databricks workspace and you provided the data provider with your sharing identifier (the Databricks-to-Databricks model).
Read shared data using a credential file (open sharing)
If data has been shared with you using the Delta Sharing open sharing protocol, you use the credential file that you downloaded to authenticate to the data provider’s Databricks account and read the shared data. Access persists as long as the underlying token is valid and the provider continues to share the data. Providers manage token expiration and rotation. Updates to the data are available to you in near real time. You can read and make copies of the shared data, but you can’t modify the source data.
To learn how to access and read shared data using the credential file in Databricks, Apache Spark, pandas, and Power BI, see Read data shared using Delta Sharing open sharing (for recipients).
Read shared data using Databricks-to-Databricks sharing
If data has been shared with you using the Databricks-to-Databricks model, then no credential file is required to access the shared data. Databricks takes care of the secure connection, and the shared data is automatically discoverable in your Databricks workspace.
To learn how to find, read, and manage that shared data in your Databricks workspace, see Read data shared using Databricks-to-Databricks Delta Sharing (for recipients).
Audit usage of shared data
If you have access to a Databricks workspace, you can use Databricks audit logs to understand who in your organization is accessing which data using Delta Sharing. See Audit and monitor data sharing.