Read data shared using Databricks-to-Databricks OpenSharing (for recipients)

This page describes how to read data shared with you using the Databricks-to-Databricks OpenSharing protocol, where Databricks manages a secure connection for data sharing. Unlike the OpenSharing open sharing protocol, the Databricks-to-Databricks protocol does not require a credential file (token-based security).

Databricks-to-Databricks sharing requires that you, as the recipient, meet both of the following requirements:

You have access to a Databricks workspace that is enabled for Unity Catalog.
The provider is using the Databricks-to-Databricks OpenSharing protocol, not the Databricks-to-Open sharing protocol, which provides you with a credential file.

If the provider is using open sharing (which provides you with a credential file) but you have a Unity Catalog-enabled workspace, you can import the provider and read the shared data in Databricks. See Import a provider and read shared data in Databricks. If you don't have a Unity Catalog-enabled workspace, see Read data shared with bearer tokens.

How do I make shared data available to my team?

To read data and notebooks that have been shared with you using the Databricks-to-Databricks protocol, you must be a user on a Databricks workspace that is enabled for Unity Catalog. A member of your team gives the data provider a unique identifier for your Unity Catalog metastore, and the data provider uses that identifier to create a secure sharing connection with your organization. The shared data then becomes available for read access in your workspace. Updates the data provider makes to the shared tables, views, volumes, and partitions are reflected in your workspace in near real time.

note

Column changes, such as adding, renaming, or deleting, may not appear in Catalog Explorer for up to one minute. Likewise, new shares and updates to shares, including adding new tables, are cached for one minute before they are available for you to view and query.

note

The tables in information_schema from a shared catalog reflect metadata stored in Unity Catalog. This metadata is updated from the provider only when you query the shared table directly or run a command such as DESCRIBE or REFRESH FOREIGN. Until then, information_schema might appear stale compared to the provider's data.

To read data that has been shared with you:

A user on your team finds the share—the container for the tables, views, volumes, and notebooks that have been shared with you—and uses that share to create a catalog—the top-level container for all data in Databricks Unity Catalog.
A user on your team grants or denies access to the catalog and the objects inside the catalog (schemas, tables, views, and volumes) to members of your team.
You read the data in the tables, views, and volumes that you have been granted access to like any data asset in Databricks that you have read-only (SELECT or READ VOLUME) access to.
You can preview and clone notebooks in the share, as long as you have the USE CATALOG privilege on the catalog.

Permissions required

To be able to list and view details about all providers and provider shares, you must have the USE PROVIDER privilege. Other users have access only to the providers and shares that they own.

To create a catalog from a provider share, you must be a metastore admin, a user who has both the CREATE CATALOG and USE PROVIDER privileges for your Unity Catalog metastore, or a user who has both the CREATE CATALOG privilege and ownership of the provider object.

The ability to grant read-only access to the schemas (databases), tables, views, and volumes in the catalog created from the share follows the typical Unity Catalog privilege hierarchy. The ability to view notebooks in the catalog created from the share requires the USE CATALOG privilege on the catalog. See Manage permissions for the schemas, tables, and volumes in an OpenSharing catalog.

Access data in a shared table or volume

To read data in a shared table or volume:

A privileged user must create a catalog from the share that contains the table or volume. This can be a metastore admin, a user who has both the CREATE CATALOG and USE PROVIDER privileges for your Unity Catalog metastore, or a user who has both the CREATE CATALOG privilege and ownership of the provider object.
That user or a user with the same privileges must grant you access to the shared table or volume.
You can access the table or volume just as you would any other data asset registered in your Unity Catalog metastore.

To make the data in a share accessible to your team, you must create a catalog from the share or mount the share to an existing shared catalog. To create a catalog from a share, you can use Catalog Explorer, the Databricks Unity Catalog CLI, or SQL commands in a Databricks notebook or the Databricks SQL query editor. To mount the share to an existing shared catalog, you can use the Catalog Explorer.

Permissions required to create a catalog: A metastore admin, a user who has both the CREATE CATALOG and USE PROVIDER privileges for your Unity Catalog metastore, or a user who has both the CREATE CATALOG privilege and ownership of the provider object.

Permissions required to mount share to an existing catalog: A user must have the USE PROVIDER privilege or have ownership of the provider object, and must also either own the existing shared catalog or have both MANAGE and USE CATALOG privileges on the existing shared catalog.

Limitations: If you want to add a data product from SAP Business Data Cloud (BDC), serverless compute or Databricks Runtime 15 or above is required.

note

If you are creating a catalog from an SAP BDC share, SAP semantic metadata (table and column comments, primary keys, foreign keys, and governance tags) syncs automatically into the catalog. No additional action is required. For details, see SAP BDC semantic metadata.

note

If the share includes views, you must use a catalog name that is different than the name of the catalog that contains the view in the provider's metastore.

Catalog Explorer
SQL
CLI

In your Databricks workspace, click Catalog to open Catalog Explorer.
At the top of the Catalog pane, click the gear icon and select OpenSharing.

Alternatively, in the upper-right corner, click Share > OpenSharing.
On the Shared with me tab, find and select the provider.
Find the desired share and click Mount to catalog on the share row.
Select Create a new catalog or Mount to existing catalog to add the data asset to an existing catalog.
Enter a name for your new catalog or choose which existing catalog to add the share to.
Click Create or Mount.

Alternatively, when you open Catalog Explorer, you can click + > Create Catalog in the upper right to create a shared catalog. See Create catalogs.

Run the following command in a notebook or the Databricks SQL query editor.

SQL
CREATE CATALOG [IF NOT EXISTS] <catalog-name>
USING SHARE <provider-name>.<share-name>;

Bash
databricks catalogs create <catalog-name> /
--provider-name <provider-name> /
--share-name <share-name>

The catalog created from a share has a catalog type of OpenSharing. You can view the type on the catalog details page in Catalog Explorer or by running the DESCRIBE CATALOG SQL command in a notebook or Databricks SQL query. All shared catalogs are listed under Catalog > Shared in the Catalog Explorer left pane.

An OpenSharing catalog can be managed in the same way as regular catalogs on a Unity Catalog metastore. You can view, update, and delete an OpenSharing catalog using Catalog Explorer, the Databricks CLI, and by using SHOW CATALOGS, DESCRIBE CATALOG, ALTER CATALOG, and DROP CATALOG SQL commands.

The 3-level namespace structure under an OpenSharing catalog created from a share is the same as the one under a regular catalog on Unity Catalog: catalog.schema.table or catalog.schema.volume.

Table and volume data under a shared catalog is read-only, which means you can perform read operations like:

DESCRIBE, SHOW, and SELECT for tables.
DESCRIBE VOLUME, LIST <volume-path>, SELECT * FROM <format>.'<volume_path>', and COPY INTO for volumes.

Notebooks in a shared catalog can be previewed and cloned by any user with USE CATALOG on the catalog.

Models in a shared catalog can be read and loaded for inference by any user with the following privileges: EXECUTE privilege on the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

Manage permissions for the schemas, tables, and volumes in an OpenSharing catalog

By default, the catalog creator is the owner of all data objects under an OpenSharing catalog and can manage permissions for any of them.

Privileges are inherited downward, although some workspaces may still be on the legacy security model that did not provide inheritance. See Privilege inheritance. Any user granted the SELECT privilege on the catalog will have the SELECT privilege on all of the schemas and tables in the catalog unless that privilege is revoked. Likewise, any user granted the READ VOLUME privilege on the catalog will have the READ VOLUME privilege on all of the volumes in the catalog unless that privilege is revoked. You cannot grant privileges that give write or update access to an OpenSharing catalog or objects in an OpenSharing catalog.

The catalog owner can delegate the ownership of data objects to other users or groups, thereby granting those users the ability to manage the object permissions and life cycles.

For detailed information about managing privileges on data objects using Unity Catalog, see Manage privileges in Unity Catalog.

Read data in a shared table

You can read data in a shared table using any of the tools available to you as a Databricks user: Catalog Explorer, notebooks, SQL queries, the Databricks CLI, and Databricks REST APIs. You must have the SELECT privilege on the table.

If your provider has shared the table WITH HISTORY, you can run transactions on the table. For more information about transaction requirements and limitations, see Transactions.

Read data in a shared foreign table or foreign schema

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

You can read data in a shared foreign table or foreign schema using any of the tools available to you as a Databricks user: Catalog Explorer, notebooks, SQL queries, the Databricks CLI, and Databricks REST APIs. You must have the SELECT privilege on the shared foreign table or foreign schema.

You can run transactions on shared foreign tables. See transaction requirements and limitations.

There are additional costs when accessing a shared foreign table or foreign schema. For information on how sharing costs are computed, see How do I incur and check OpenSharing costs?.

Limitations: You can't bypass cluster restriction to read shared foreign tables, even if the provider permits it.

Read data in a shared foreign Iceberg table

Preview

This feature is in Public Preview.

You can read data in a shared foreign Iceberg table using any of the tools available to you as a Databricks user: Catalog Explorer, notebooks, SQL queries, the Databricks CLI, and Databricks REST APIs. In Catalog Explorer, a shared foreign Iceberg table displays with a table type of Foreign and a data source format of Iceberg.

You have access to the source Iceberg location but can only perform the following types of queries:

Snapshot queries
Streaming queries

Requirements:

You must have the SELECT privilege on the shared foreign Iceberg table.
You must use Databricks Runtime 15.4 LTS or above.

Read data in a shared volume

You can read data in a shared volume using any of the tools available to you as a Databricks user: Catalog Explorer, notebooks, SQL queries, the Databricks CLI, and Databricks REST APIs. You must have the READ VOLUME privilege on the volume.

Read ABAC-secured data and apply ABAC policies

Attribute-based access control (ABAC) is a data governance model that provides flexible, scalable, and centralized access control across Databricks.

Create ABAC policies for shared tables, schemas, and catalogs created from a share. Materialized views are supported with limitations. You can't create ABAC policies for shared streaming tables or materialized views. To configure ABAC policies, see Create and manage row filter and column mask policies.

Read row tracking columns in shared tables

If the data provider has enabled row tracking on a shared table, you can query the row tracking metadata columns. See Row tracking in Databricks for a list of available columns.

How you access these columns depends on the type of table shared:

Tables shared with history and without partition filters: You can query row tracking columns without restrictions.
Tables with partition filters or tables shared without history: You must use Scala Spark and explicitly set the responseFormat option to delta.
Scala
```
spark.read.option(“responseformat”, “delta”).table(“shared_table”).select(“_metadata.row_id”).show()
```

Load a shared model for inference

For details on loading a shared model and using it for batch inference, see Load model version by alias for inference workloads.

Query a table's history data

If history is shared along with the table, you can query the table data as of a version or timestamp. Requires Databricks Runtime 12.2 LTS or above.

For example:

SQL
SELECT * FROM vaccine.vaccine_us.vaccine_us_distribution VERSION AS OF 3;
SELECT * FROM vaccine.vaccine_us.vaccine_us_distribution TIMESTAMP AS OF "2023-01-01 00:00:00";

In addition, if the change data feed (CDF) is enabled with the table, you can query the CDF. Both version and timestamp are supported:

SQL
SELECT * FROM table_changes('vaccine.vaccine_us.vaccine_us_distribution', 0, 3);
SELECT * FROM table_changes('vaccine.vaccine_us.vaccine_us_distribution', "2023-01-01 00:00:00", "2022-02-01 00:00:00");

For more information about change data feed, see Use change data feed on Databricks.

Query a table using Apache Spark Structured Streaming

If a table is shared with history, you can use it as the source for Spark Structured Streaming. Requires Databricks Runtime 12.2 LTS or above.

Supported options:

ignoreDeletes: Ignore transactions that delete data.
ignoreChanges: Re-process updates if files were rewritten in the source table due to a data changing operation such as UPDATE, MERGE INTO, DELETE (within partitions), or OVERWRITE. Unchanged rows can still be emitted. Therefore your downstream consumers should be able to handle duplicates. Deletes are not propagated downstream. ignoreChanges subsumes ignoreDeletes. Therefore, if you use ignoreChanges, your stream will not be disrupted by either deletions or updates to the source table.
startingVersion: The shared table version to start from. All table changes starting from this version (inclusive) will be read by the streaming source.
startingTimestamp: The timestamp to start from. All table changes committed at or after the timestamp (inclusive) will be read by the streaming source. Example: "2023-01-01 00:00:00.0"
maxFilesPerTrigger: The number of new files to be considered in every micro-batch.
maxBytesPerTrigger: The amount of data that gets processed in each micro-batch. This option sets a “soft max”, meaning that a batch processes approximately this amount of data and might process more than the limit in order to make the streaming query move forward in cases when the smallest input unit is larger than this limit.
readChangeFeed: Stream read the change data feed of the shared table.

Supported triggers:

Trigger.ProcessingTime: The default trigger, used when no explicit trigger is specified. Data is processed in continuous micro-batches.
Trigger.AvailableNow: The query captures the shared table's server-side version at start, processes the backlog as multiple micro-batches that honor maxFilesPerTrigger and maxVersionsPerRpc, and terminates when the captured version is exhausted. Requires Databricks Runtime 18.0 or above for responseFormat=delta; responseFormat=parquet additionally requires the bundled delta-sharing-client jar to be 1.4.0 or above (older versions fall back to a Trigger.AvailableNow wrapper that does not respect maxVersionsPerRpc).
Trigger.Once: Deprecated; use Trigger.AvailableNow instead. On Databricks Runtime 17.3 and below, Trigger.AvailableNow is automatically converted to Trigger.Once because native Trigger.AvailableNow support requires Databricks Runtime 18.0 or above.

Sample Structured Streaming queries

Scala
Python

Scala
spark.readStream.format("deltaSharing")
.option("startingVersion", 0)
.option("ignoreChanges", true)
.option("maxFilesPerTrigger", 10)
.table("vaccine.vaccine_us.vaccine_us_distribution")

Python
spark.readStream.format("deltaSharing")\
.option("startingVersion", 0)\
.option("ignoreDeletes", true)\
.option("maxBytesPerTrigger", 10000)\
.table("vaccine.vaccine_us.vaccine_us_distribution")

If change data feed (CDF) is enabled with the table, you can stream read the CDF.

Scala
spark.readStream.format("deltaSharing")
.option("readChangeFeed", "true")
.table("vaccine.vaccine_us.vaccine_us_distribution")

Apply row filters and column masks

To apply row filters and column masks on tables and foreign tables shared by your data provider, see Manually apply row filters and column masks. You cannot apply columns masks to streaming tables or materialized views.

Read tables with deletion vectors or column mapping enabled

Preview

This feature is in Public Preview.

Deletion vectors are a storage optimization feature that your provider can enable on shared Delta tables. See Deletion vectors in Databricks.

Databricks also supports column mapping for Delta tables. See Rename and drop columns with Delta Lake column mapping.

If your provider shared a table with deletion vectors or column mapping enabled, you can perform batch reads on the table using a SQL warehouse or a cluster running Databricks Runtime 14.1 or above. CDF and streaming queries require Databricks Runtime 14.2 or above.

You can perform batch queries as-is, because they can automatically resolve responseFormat based on the table features of the shared table.

To read a change data feed (CDF) or to perform streaming queries on shared tables with deletion vectors or column mapping enabled, you must set the additional option responseFormat=delta.

The following examples show batch, CDF, and streaming queries:

Scala
import org.apache.spark.sql.SparkSession

// Batch query
spark.read.format("deltaSharing").table(<tableName>)

// CDF query
spark.read.format("deltaSharing")
  .option("readChangeFeed", "true")
  .option("responseFormat", "delta")
  .option("startingVersion", 1)
  .table(<tableName>)

// Streaming query
spark.readStream.format("deltaSharing").option("responseFormat", "delta").table(<tableName>)

Read shared managed Iceberg tables

Preview

This feature is in Public Preview.

Reading shared managed Iceberg tables is the same as reading shared tables, with these exceptions:

Support in Databricks-to-Open sharing:

The instructions in this article focus on reading shared data using Databricks user interfaces, specifically Unity Catalog syntax and interfaces. Due to the limitation on advanced Delta feature support for OpenSharing connectors, querying shared managed Iceberg tables using Python, Tableau, and Power BI is not supported.

Change data feed:

Change data feed is not supported for managed Iceberg tables.

Databricks Iceberg limitations:

Iceberg table and managed Iceberg table limitations apply. See Limitations.

Read shared views

Reading shared views is the same as reading shared tables, with these exceptions:

Shared views restrictions:

Shared views only support a subset of built-in functions and operators in Databricks. See Functions supported in Databricks-to-Databricks view sharing.
Recipients can't query more than 20 shared views in a query in Databricks-to-Databricks sharing. The shared views can't be from more than five different provider-shares.
When the provider is from the same account, or when you use serverless compute in a different account, you can't query multiple dependent views from the same provider in a single query. For example, if view1 depends on view2 on the provider side and both views are shared with you, you can't reference both view1 and view2 in the same query.
You can run transactions on shared views. See transaction requirements and limitations.

Naming requirements:

The catalog name that you use for the shared catalog that contains the view cannot be the same as any provider catalog that contains a table referenced by the view. For example, if the shared view is contained in your test catalog, and one of the provider's tables referenced in that view is contained in the provider's test catalog, the query will result in a namespace conflict error. See Create a catalog from a share.

Query result timeout:

If you don't have direct access to the underlying data, Databricks performs on-the-fly materialization when querying the view. When this materialization takes longer than 5 minutes, the query times out. Switch to serverless compute to avoid this limitation.

History and streaming:

You cannot query history or use a view as a streaming source.

View support in Databricks-to-Open sharing:

The instructions in this article focus on reading shared data using Databricks user interfaces, specifically Unity Catalog syntax and interfaces. You can also query shared views using Apache Spark, Python, and BI tools like Tableau and Power BI.

Costs:

For information on how sharing costs are computed, see How do I incur and check OpenSharing costs?.

Read shared streaming tables and materialized views

Reading shared streaming tables and materialized views is the same as reading shared tables, with these exceptions:

Support in Databricks-to-Open sharing:

The instructions on this page focus on reading shared data using Databricks user interfaces, specifically Unity Catalog syntax and interfaces. You can also query shared streaming tables and materialized views using Apache Spark, Python, and BI tools like Tableau and Power BI. See Read data shared with bearer tokens.

Transactions:

You can run transactions on shared materialized views and streaming tables. See transaction requirements and limitations.

SQL limitations:

The current_recipient function is not supported.
The DESCRIBE EXTENDED command is not supported.

Column mapping:

If you are using classic compute when receiving a share from a different Databricks account, you must specify the responseFormat like below when querying a materialized view or streaming tables with column mapping.

Python
spark.read.option("responseFormat", "delta").table("catalog_name.schema_name.mv_name")

If you are using classic compute when sharing within the same Databricks account or serverless compute in any scenario, you can query without restrictions.

Costs:

For information on how sharing costs are computed, see How do I incur and check OpenSharing costs?.

Materialized view specific exceptions
Streaming table specific exceptions

History:

You cannot query history.

Refresh:

You cannot access the refresh status and refresh schedule of the materialized view.

View and streaming table creation:

You cannot create streaming tables on shared materialized views.

Read shared Python UDFs

Reading shared Python UDFs is the same as reading shared tables. After you create a new catalog for the share or mount the share to an existing catalog, you can access and use the Python UDF.

Read shared `FeatureSpecs`

Reading shared FeatureSpecs is the same as reading shared tables. After you create a new catalog for the share or mount the share to an existing catalog, you can deploy the FeatureSpec to your desired serving endpoint. To learn how to create an endpoint, see Create an endpoint.

If your provider updates with the FeatureSpec with a new dependency but does not share the dependency with you, your model fails. Contact your data provider to check for new dependencies.

Before serving the FeatureSpec, you must create an online store and publish the dependent tables in your workspace. For how to create online stores and publish the table, see Databricks Online Feature Stores.

Read shared notebooks

To preview and clone shared notebook files, you can use Catalog Explorer.

Storage limitation: If your storage uses Private Endpoints, you cannot read shared notebooks.

Permissions required: Catalog owner or user with the USE CATALOG privilege on the catalog created from the share.

In your Databricks workspace, click Catalog.
In the left pane, expand the Catalog menu, find and select the catalog created from the share.
On the Other assets tab, you'll see any shared notebook files.
Click the name of a shared notebook file to preview it.
(Optional) Click the Clone button to import the shared notebook file to your workspace.
1. On the Clone to dialog, optionally enter a New name, then select the workspace folder you want to clone the notebook file to.
2. Click Clone.
3. Once the notebook is cloned, a dialog pops up to let you know that it successfully cloned. Click reveal in the notebook editor on the dialog to view it in the notebook editor.
See Databricks notebooks.

Unmount a share to remove the data asset from its catalog.

Permissions required: User with the USE CATALOG and MANAGE privileges on the shared catalog.

In your Databricks workspace, click Catalog to open Catalog Explorer.
At the top of the Catalog pane, click the gear icon and select OpenSharing.

Alternatively, in the upper-right corner, click Share > OpenSharing.
On the Shared with me tab, find and select the provider.
Click on the share row.
Click Unmount share.
Click Unmount.

How do I make shared data available to my team?​

Permissions required​

Access data in a shared table or volume​

Create a catalog from a share​

Manage permissions for the schemas, tables, and volumes in an OpenSharing catalog​

Read data in a shared table​

Read data in a shared foreign table or foreign schema​

Read data in a shared foreign Iceberg table​

Read data in a shared volume​

Read ABAC-secured data and apply ABAC policies​

Read row tracking columns in shared tables​

Load a shared model for inference​

Query a table's history data​

Query a table using Apache Spark Structured Streaming​

Sample Structured Streaming queries​

Apply row filters and column masks​

Read tables with deletion vectors or column mapping enabled​

Read shared managed Iceberg tables​

Read shared views​

Read shared streaming tables and materialized views​

Read shared Python UDFs​

Read shared FeatureSpecs​

Read shared notebooks​

Unmount a share​

How do I make shared data available to my team?

Permissions required

Access data in a shared table or volume

Create a catalog from a share

Manage permissions for the schemas, tables, and volumes in an OpenSharing catalog

Read data in a shared table

Read data in a shared foreign table or foreign schema

Read data in a shared foreign Iceberg table

Read data in a shared volume

Read ABAC-secured data and apply ABAC policies

Read row tracking columns in shared tables

Load a shared model for inference

Query a table's history data

Query a table using Apache Spark Structured Streaming

Sample Structured Streaming queries

Apply row filters and column masks

Read tables with deletion vectors or column mapping enabled

Read shared managed Iceberg tables

Read shared views

Read shared streaming tables and materialized views

Read shared Python UDFs

Read shared `FeatureSpecs`

Read shared notebooks

Unmount a share