Databricks-managed Delta Sharing (Preview)

Preview

Delta Sharing is in Public Preview. To participate in the preview, you must enable the External Data Sharing feature group in the Databricks Account Console. See Enable the External Data Sharing feature group for your account.

Delta Sharing is subject to applicable terms. Enabling the External Data Sharing feature group represents acceptance of those terms.

Databricks-managed Delta Sharing allows data providers to share data and data recipients to access the shared data.

As a data provider, you can share data with recipients that don’t use Databricks. For more information, see Share data using Delta Sharing. If you want to share data with data recipients outside of your account, enable external Delta Sharing on the metastore. You don’t have to enable external Delta Sharing if you are sharing data within the same account.

As a data recipient, you can access shared data in an open environment (such as your own compute cluster or AWS EMR). For more information, see Access data shared with you using Delta Sharing.

User guide for data providers

This section explains the concepts and processes you need to understand as the data provider in a Delta Sharing relationship.

Concepts for data providers

  • Data recipient: A data recipient is an object in Unity Catalog metastore representing the data recipient in the real world who accesses the shared data. A recipient can have access to multiple shares.

  • Share: A share is a collection of datasets to be shared in a Unity Catalog metastore. A metastore can have multiple shares, and you can control which recipients have access to each share.

concepts for data providers

Manage shares

In Delta Sharing, a share is a named object that contains a collection of tables in a metastore that you wish to share as a group. A share can contain tables from only a single metastore. You can add or remove tables from a share at any time.

USE CATALOG main;
USE default;
CREATE TABLE IF NOT EXISTS my_table (num Int, name String) USING DELTA PARTITIONED BY (num);
INSERT INTO my_table VALUES (1, "cat"), (1, "dog"), (2, "fish");

CREATE SHARE IF NOT EXISTS my_share;
ALTER SHARE my_share ADD TABLE my_table AS db0.t0;

Manage recipients

A recipient is a named object that represents the identity of a data recipient in the real world who consumes the shared data. For a data recipient in Databricks, the recipient object has an authentication type of DATABRICKS, which suggests that it uses Databricks-managed Delta Sharing to access the data. For data recipients who use the open source connectors and bearer tokens to access the data, the recipient object has an authentication type of TOKEN.

More specifically, a recipient object represents a data recipient on a particular Unity Catalog metastore. When a recipient with DATABRICKS authentication type is created, it gets associated with a Unity Catalog metastore on a specific cloud platform and cloud region. It is guaranteed that the data shared with this recipient can only be accessed on that metastore.

On the contrary, a recipient with TOKEN authentication type represents a data recipient that can be in any environment and the data access can happen anywhere with any open-source connectors.

When you manage a recipient object with DATABRICKS authentication type, you don’t need to explicitly handle any credentials. Databricks-managed Delta Sharing handles all complexities like identity verification, authentication, auditing, etc and makes sure data sharing is secure.

Create a recipient with Databricks authentication type

You can use the CREATE RECIPIENT SQL command to create a recipient with a sharing identifier.

CREATE RECIPIENT [IF NOT EXISTS] <recipient_name>
USING ID <sharing_identifier>
[COMMENT <comment]

<sharing_identifier> is the global unique identifier of a Unity Catalog metastore owned by the data recipient with whom you’d like to share data. It has a format of <cloud>:<region>:<uuid>. Example value: aws:eu-west-1:b0c978c8-3e68-4cdf-94af-d05c120ed1ef.

Example of managing recipients

This field is referred to as sharing identifier. Your provider needs the sharing identifier to create a Databricks-to-Databricks recipient.

example of Sharing Identifier

View and Delete a Recipient

You can use the below SQL commands to view, update, and delete a recipient. When a recipient is deleted, the data recipients it represents can no longer access the shared data.

SHOW RECIPIENTS [LIKE <pattern>];
DESC RECIPIENT <recipient_name>;
DROP RECIPIENT [IF EXISTS] <recipient_name>;

Share data with recipients

Once you have all the shares and recipients in place, it is easy to manage the data sharing by using GRANT and REVOKE SQL commands to grant recipient access to the share.

To grant access on a share to a recipient:

GRANT SELECT
ON SHARE <share_name>
TO RECIPIENT <recipient_name>;

To revoke access:

REVOKE SELECT
ON SHARE <share_name>
FROM RECIPIENT <recipient_name>;

To view the current grants on a share and current grants possessed by a recipient:

SHOW GRANT ON SHARE <share_name>;
SHOW GRANT TO RECIPIENT <recipient_name>;

User guide for data recipients

The following section providers you with concepts and processes you need to understand as a data recipient in a Delta Sharing relationship.

Concepts for data recipients

  • Data provider: A data provider is an object on Unity Catalog metastore representing the data provider in the real world who shares the data. A provider contains shares which further contains the shared data.

  • Share: A share is a collection of datasets shared by the data provider. A share belongs to a data provider and one can create a catalog from a share to access the dataset inside.

  • Catalog: A catalog is the top-level object in Unity Catalog’s 3-level namespace for organizing data. A catalog created from a share is called a Delta Sharing catalog.

The following diagram shows a holistic view of all Delta Sharing objects under a Unity Catalog metastore and the relationships among them.

view of all Delta Sharing

Share your sharing identifier

In order to allow a data provider on Databricks to share data with you through Delta Sharing, they would need to know a globally unique identifier of the Unity Catalog metastore where you are going to access the shared data. It has a format of <cloud>:<region>:<uuid>. You can acquire that globally unique identifier by using the default SQL function CURRENT_METASTORE:

example of CURRENT_METASTORE

View providers and their shares

A provider is a named object that represents the data provider in the real world who shares the data with you. For a data provider in Databricks, the provider object has an authentication type of DATABRICKS, which suggests that it uses Databricks-managed Delta Sharing to share the data. For data providers who use the open source protocol and recipient profile authentication method to share the data, its provider object has an authentication of TOKEN.

More specifically, a provider object is representing a data provider on a particular Unity Catalog metastore on a specific cloud platform and region. When a data provider shares data with your current Unity Catalog metastore, provider objects are automatically created under the metastore. You can view available data providers under your Unity Catalog metastore by using the SHOW PROVIDERS SQL command.

SHOW PROVIDERS [LIKE <pattern>]

You can also use the DESC PROVIDER SQL command to view the detailed attributes of a data provider, including the cloud, region, and metastore UUID of the Unity Catalog metastore owned by the data provider.

DESC PROVIDER <provider-name>

A provider object can contain zero to multiple shares that further contains the shared dataset. You can use the SHOW SHARES IN PROVIDER SQL command to view the available shares under a provider object.

SHOW SHARES IN PROVIDER <provider-name>

Example usage:

example of provider data

Access data inside a share

To access the data inside a share, you need to create a catalog from the share.

     CREATE CATALOG [IF NOT EXISTS] <catalog-name>
     USING SHARE <provider-name>.<share-name>;

The catalog created from a share is called a Delta Sharing catalog. You can tell the type of the catalog by viewing its type attribute by using DESC CATALOG SQL command.

example of accessing data inside a share

A Delta Sharing catalog can be managed in the same way as regular catalogs on a Unity Catalog metastore. You can view, update, and delete a Delta Sharing catalog by using SHOW CATALOGS, DESC CATALOG, ALTER CATALOG, and DROP CATALOG SQL commands.

The 3-level namespace structure under a Delta Sharing catalog created from a share is the same as the one under a regular catalog on Unity Catalog: <catalog, schema, table>.

Data objects (schemas, tables) under a shared catalog are read-only, which means you can perform read operations like DESC, SHOW, SELECT but can’t perform write or update operations like MODIFY, UPDATE, or DROP. The only exception to this rule is that the owner of the data object or the metastore admin can update the owner of the data objects to other users or groups.

Example usage:

CREATE CATALOG vaccine USING SHARE world_health_org.vaccine_share;
DESC CATALOG vaccine;
USE CATALOG vaccine;
COMMENT ON CATALOG vaccine IS vaccine data shared by WHO;
DROP CATALOG vaccine;

SHOW SCHEMAS;
DESC SCHEMA vaccine_us;
USE SCHEMA vaccine_us;

SHOW TABLES;
DESC TABLE vaccine_us_distribution;
SELECT * FROM vaccine_us_distribution LIMIT 100;

Manage permissions inside a Delta Sharing catalog

By default, the owners of all data objects under a Delta Sharing catalog are set to be the catalog creator. The catalog owner can choose to delegate the ownership of certain data objects to other users or groups if they see fit. On Unity Catalog, the owner of a data object can manage its permissions and life cycles.

To transfer the ownership of the data objects to other users or groups, use the ALTER … OWNER TO command.

GRANT USAGE ON CATALOG <catalog-name> TO  `<user-or-group>`;
REVOKE USAGE ON CATALOG  <catalog-name> FROM  `<user-or-group>`;

GRANT USAGE ON SCHEMA <schema-name> TO  `<user-or-group>`;
REVOKE USAGE ON SCHEMA  <schema-name> FROM  `<user-or-group>`;

GRANT SELECT ON TABLE <table-name> TO  `<user-or-group>`;
REVOKE SELECT ON TABLE  <table-name> FROM  `<user-or-group>`;