Create and work with output tables in Databricks Clean Rooms
Preview
This feature is in Public Preview.
This article introduces output tables, which are temporary read-only tables generated by a notebook run and shared to the notebook runner’s Unity Catalog metastore. This article describes how to use a notebook to create output tables and how collaborators can read these output tables in their Unity Catalog metastore.
Overview of output tables
Output tables let you temporarily save the output of notebooks that are run in a clean room to an output catalog in your Unity Catalog metastore, where you can make the data available to members of your team who don’t have the ability to run the notebooks themselves. You can also use Databricks jobs to run notebooks and perform tasks on output tables. Combined with the Clean Room notebook task type and support for task values, output tables let you create complex workflows that depend on Clean room notebooks.
Output tables are read-only.
Only the specific principal (user, group, or service principal) who runs the notebook has default read access to the output table. There is no write access. A metastore admin can grant read access to other principals in their Databricks account, using standard Unity Catalog privileges.
Output tables are stored for 30 days in the central clean room’s default storage location and shared to the collaborator’s metastore using Delta Sharing. If you want to keep an output table for more than 30 days, you must copy it to local storage.
Each notebook run creates a new schema in the output catalog. New runs cannot append an existing output table.
Important
Output tables are supported only when the central clean room is hosted on AWS. However, collaborators in Databricks on all three clouds—AWS, Azure, and Google Cloud—can share notebooks that create output tables and can read output tables that are generated when they run shared notebooks. Google Cloud collaborators must be participants in the Clean Rooms private preview.
Create an output table
To create an output table, use the parameters cr_output_catalog
and cr_output_schema
in the three-part table namespace. Each run of the notebook produces a new schema.
In the following example, the notebook cell creates an output table called overlapping_users
in the collborator’s output catalog that lists the users whose email address shows up in both the collaborator.advertiser.profiles
and creator.publisher.profiles
tables.
CREATE TABLE identifier(:cr_output_catalog || '.' || :cr_output_schema || '.overlapping_users') AS
SELECT collab_profiles.*
FROM collaborator.advertiser.profiles AS collab_profiles
JOIN creator.publisher.profiles AS creator_profiles
ON collab_profiles.email = creator_profiles.email
Read an output table
Output tables appear in a shared catalog in the notebook runner’s metastore. In the Catalog Explorer Catalog pane, they appear in the Shared catalogs list.
Reading an output table is like reading any other table in Unity Catalog. You must have SELECT
on the table, USE CATALOG
on the shared output catalog, and USE SCHEMA
on the automatically-generated schema. The user who ran the notebook that created the table has these permissions by default.
Before you begin
This section describes cloud, configuration, and compute requirements for reading output tables.
Cloud requirements
While the central clean room must be on AWS to support output tables, collaborator workspaces can be on any of the three clouds: AWS, Azure, or Google Cloud. Google Cloud collaborators must be participants in the Clean Rooms private preview.
Compute requirements
Queries on output tables require serverless compute. See Connect to serverless compute.
Permissions required to read an output table
The user who ran the notebook that created the output table has permission to read from the output table by default. All other users must have the following permissions granted to them:
SELECT
on the tableUSE CATALOG
on the output catalogUSE SCHEMA
on the output schema
Run the notebook
To generate shared output tables in your output catalog, a user with access to the clean room must run the notebook. See Run notebooks in clean rooms. Each notebook run creates a new output schema and table.
Tip
You can use Databricks jobs to run notebooks and perform tasks on output tables, enabling complex workflows. See Use Databricks Workflows to run clean room notebooks.
Find and view an output table
The user who runs the notebook that creates the output table can find a link to the output table on the notebook run history and run details pages in the Clean Rooms UI. In both cases, the link is in the Output schema field. See Monitor clean room notebook runs.
Run history:
Run details:
You can also find the output catalog in the list of Shared catalogs in your Catalog Explorer Catalog pane.
Limitations
In addition to the requirements listed in Overview of output tables and Before you begin, output tables have the following limitations:
Output tables are supported only when the central clean room is hosted on AWS and when the clean room was created after the output table feature was released.
Only tables are supported. Volumes and views, for example, are not.
You can create up to 100 output tables per notebook.