This article shows how to to access data that has been shared with you using Delta Sharing.
Delta Sharing is an open standard for secure data sharing. A Databricks user, called a data provider, can use Delta Sharing to share data with a person or group outside of their organization, called a data recipient.
How you access the data depends on whether you yourself are a Databricks user and whether or not your data provider configured the data being shared with you for Databricks-to-Databricks sharing or open sharing.
In the Databricks-to-Databricks model, you must be a user on a Databricks workspace that is enabled for Unity Catalog. A member of your team provides the data provider with a unique identifier for your Databricks workspace, and the data provider uses that to create a secure sharing connection. The shared data simply becomes available for access in your workspace. If necessary, a member of your team configures granular access control on that data.
In the open sharing model, you can use any tool you like (including Databricks) to access the shared data. The data provider sends you an activation URL over a secure channel. You follow it to download a credential file that lets you access the data shared with you.
The shared data is not provided by Databricks directly but by data providers running on Databricks.
By accessing a data provider’s shared data as a data recipient, data recipient represents that it has been authorized to access the data share(s) provided to it by the data provider and acknowledges that (1) Databricks has no liability for such data or data recipient’s use of such shared data, and (2) Databricks may collect information about data recipient’s use of and access to the shared data (including identifying any individual or company who accesses the data using the credential file in connection with such information) and may share it with the applicable data provider.
How you access the data depends on whether your data provider shared data with you using the open sharing protocol or the Databricks-to-Databricks sharing protocol. See Databricks-to-Databricks sharing and open sharing.
In the Databricks-to-Databricks model:
The data provider sends you instructions for finding a unique identifier for the Unity Catalog metastore associated with your Databricks workspace, and you send it to them.
The sharing identifier is a string consisting of the metastore’s cloud, region, and UUID (the unique identifier for the metastore), in the format
<cloud>:<region>:<uuid>. For example,
To get the sharing identifier using Catalog Explorer:
In your Databricks workspace, click Catalog.
In the left pane, expand the Delta Sharing menu and select Shared with me.
Above the Providers tab, click the Sharing identifier copy icon.
To get the sharing identifier using a notebook or Databricks SQL query, use the default SQL function
CURRENT_METASTORE. If you use a notebook, it must run on a shared or single-user cluster in the workspace you will use to access the shared data.
The data provider creates:
A recipient in their Databricks account to represent you and the users in your organization who will access the data.
A share, which is a representation of the tables and views to be shared with you.
You access the data shared with you. You or someone on your team can, if necessary, configure granular data access on that data for your users. See Read data shared using Databricks-to-Databricks Delta Sharing.
In the open sharing model:
The data provider creates:
A recipient in their Databricks account to represent you and the users in your organization who will access the data. A token and credential file are generated as part of this configuration.
A share, which is a representation of the tables and partitions to be shared with you.
The data provider sends you an activation URL over a secure channel. You follow it to download a credential file that lets you access the data shared with you.
Don’t share the activation link with anyone. You can download a credential file only once. If you visit the activation link again after the credential file has already downloaded, the Download Credential File button is disabled.
If you lose the activation link before you use it, contact the data provider.
Store the credential file in a secure location.
Don’t share the credential file with anyone outside the group of users who should have access to the shared data. If you need to share it with someone in your organization, Databricks recommends using a password manager.
How you read data that has been shared securely with you using Delta Sharing depends on whether you received a credential file (the open sharing model) or you are using a Databricks workspace and you provided the data provider with your sharing identifier (the Databricks-to-Databricks model).
If data has been shared with you using the Delta Sharing open sharing protocol, you use the credential file that you downloaded to authenticate to the data provider’s Databricks account and read the shared data. Access persists until the provider stops sharing the data with you. Updates to the data are available to you in near real time. You can read and make copies of the shared data, but you can’t modify the source data.
To learn how to access and read shared data using the credential file in Databricks, Apache Spark, pandas, and Power BI, see Read data shared using Delta Sharing open sharing.
If data has been shared with you using the Databricks-to-Databricks model, then no credential file is required to access the shared data. Databricks takes care of the secure connection, and the shared data is automatically discoverable in your Databricks workspace.
To learn how to find, read, and manage that shared data in your Databricks workspace, see Read data shared using Databricks-to-Databricks Delta Sharing.
If you have access to a Databricks workspace, you can use Databricks audit logs to understand who in your organization is accessing which data using Delta Sharing. See Audit and monitor data access using Delta Sharing (for recipients).