Key concepts

Preview

Unity Catalog is in Public Preview. To participate in the preview, contact your Databricks representative.

This article explains the key concepts behind how Unity Catalog brings security and governance to your Lakehouse.

Account-level identities

Unity Catalog uses the account-level identity system in Databricks to resolve users and groups and enforce permissions. You configure users and groups either directly in the Databricks account console or by syncing them automatically from your identity provider (IdP). Refer to those account-level users and groups when creating access-control policies in Unity Catalog.

Although Databricks also allows adding local groups to workspaces, those local groups cannot be used in Unity Catalog. Commands that reference local groups return an error that the group was not found.

Unity Catalog users must also be added to workspaces to access Unity Catalog data in a notebook, a Databricks SQL query, the Databricks SQL Data Explorer, or a REST API command, and to join Unity Catalog data with data that is local to a workspace.

Data permissions

In Unity Catalog, data is secure by default. Initially, users have no access to data in a metastore. Metastore admins and object owners can manage object permissions using the Databricks SQL Data Explorer or SQL commands.

To learn more, see Data permissions.

Object model

The following diagram illustrates the main securable objects in Unity Catalog:

Unity Catalog object model diagram

Note

Some objects, such as external locations and storage credentials, are not shown in the diagram. These objects reside in the metastore at the same level as catalogs.

Metastore

A metastore is the top-level container of objects in Unity Catalog. It stores data assets (tables and views) and the permissions that govern access to them. Databricks account admins can create metastores and assign them to Databricks workspaces to control which workloads use each metastore.

Note

Unity Catalog offers a new metastore with built in security and auditing. This is distinct from the metastore used in previous versions of Databricks, which was based on the Hive Metastore.

Metastore admin

A metastore admin can manage the privileges for all securable objects within a metastore, such as who can create catalogs or query a table.

Note

For more details about Unity Catalog’s data governance model, see Data Permissions.

The account admin who creates a metastore is its initial owner and metastore admin. The owner or owners (if the object is owned by a group) of an object can grant privileges on that object and its descendents to others. A metastore admin can delegate management of permissions for a metastore object to a different user or group by changing the object’s ownership.

The account admin who created the metastore is its initial metastore admin. An account admin can reassign another user or group as the metastore admin. Account admins can always manage a metastore, regardless of who has the metastore admin role. See (Recommended) Sync account-level identities from your IdP.

Workspace admin

A workspace admin can manage workspace objects like users, jobs, and notebooks, regardless of whether Unity Catalog is configured for a workspace. In other words, if a workspace is configured to use Unity Catalog, workspace admins retain the ability to manage workspace objects. Although workspace admins cannot manage access to data stored in Unity Catalog in the same way a metastore admin can, they do have the ability to perform workspace management tasks such as adding users and service principals to the workspace, and they can view and modify workspace objects like jobs and notebooks. This may give access to data registered in Unity Catalog. The workspace admin therefore remains a privileged role that should be distributed carefully.

Default storage location

Each metastore is configured with a default storage location in Amazon S3. This is the default storage location for data in managed tables. External tables store data in other S3 paths.

Each metastore accesses the default using a single storage credential that is configured during metastore creation. You can configure additional storage credentials to access other storage locations, such as to access an external table or read from a data file stored in your cloud tenant. User code never receives full access to a storage credential. Instead, Unity Catalog generates scoped access tokens that allow each user or application to access the requested data.

Catalog

A catalog is the first layer of Unity Catalog’s three-level namespace and is used to organize your data assets. Users can see all catalogs on which they have been assigned the USAGE data permission.

Schema

A schema (also called a database) is the second layer of Unity Catalog’s three-level namespace and organizes tables and views. To access or list a table or view in a schema, a user must have the USAGE data permission on the schema and its parent catalog and the SELECT permission on the table or view.

Table

A table resides in the third layer of Unity Catalog’s three-level namespace and contains rows of data. To create a table, a user must have CREATE and USAGE permissions on the schema and the USAGE permission on its parent catalog. To query a table, the user must have the SELECT permission on the table and the USAGE permission on its parent schema and catalog.

A table can be managed or external.

Managed table

Managed tables are the default way to create tables in Unity Catalog. These tables are stored in the managed storage location you configured when you created each metastore.

  • To create a managed table, run a CREATE TABLE command without a LOCATION clause.

  • To delete a managed table, use the DROP TABLE statement.

When a managed table is dropped, its underlying data is deleted from your cloud tenant. The only supported format for managed tables is Delta.

Example Syntax:

CREATE TABLE <example-table>(id STRING, value STRING)

External table

External tables are tables whose data is stored in a storage location outside of the managed storage location, and are not fully managed by Unity Catalog. When you run DROP TABLE on an external table, Unity Catalog does not delete the underlying data. You can manage privileges on external tables and use them in queries in the the same way as managed tables. To create an external table, specify a LOCATION path in your CREATE TABLE statement. External tables can use the following file formats:

  • DELTA

  • CSV

  • JSON

  • AVRO

  • PARQUET

  • ORC

  • TEXT

To manage access to the underlying cloud storage for an external table, Unity Catalog introduces two new object types: storage credentials and external locations.

  • A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant, such as an IAM role. Each storage credential is subject to Unity Catalog access-control policies that control which users and groups can access the credential.

If a user attempts to use a storage credential on which they haven’t been granted the USAGE permission, the request fails and Unity Catalog does not attempt to authenticate to the cloud tenant on behalf of the user.

  • An external location is an object that contains a reference to a storage credential and a cloud storage path. The external location grants access only to that path and its child directories and files. Each external location is subject to Unity Catalog access-control policies that control which users and groups can access the credential.

    If a user attempts to use an external location on which they haven’t been granted the USAGE permission, the request fails and Unity Catalog does not attempt to authenticate to the cloud tenant on behalf of the user.

Only metastore admins can create and grant permissions on storage credentials and external locations.

Example Syntax:

CREATE TABLE <example-table>
(id STRING, value STRING)
USING delta
LOCATION "s3://<your-storage-path>"

Note

Before a user can create an external table, the user must have the CREATE TABLE privilege on an external location or storage credential that grants access to the LOCATION specified in the CREATE TABLE statement.

View

A view resides in the third layer of Unity Catalog’s three-level namespace and is a read-only object composed from one or more tables and views in a metastore. A view can be composed from tables and views in multiple schemas or catalogs.

Example syntax:

CREATE VIEW main.default.experienced_employee
  (id COMMENT 'Unique identification number', Name)
  COMMENT 'View for experienced employees'
AS SELECT id, name
   FROM all_employee
   WHERE working_years > 5;

Cluster security mode

To ensure the integrity of access controls and enforce strong isolation guarantees, Unity Catalog imposes some security requirements on compute resources. For this reason, Unity Catalog introduces the concept of a cluster’s security mode. Unity Catalog is secure by default; if a cluster is not configured with an appropriate security mode, the cluster can’t access data in Unity Catalog.

Note

If your workspace is assigned to a Unity Catalog metastore, you use security mode instead of High Concurrency clusters to ensure the integrity of access controls and enforce strong isolation guarantees. High Concurrency cluster mode is not available with Unity Catalog.

When you create a Data Science & Engineering or Databricks Machine Learning cluster, you can select from the following cluster security modes:

  • None: No isolation. Does not enforce workspace-local table access control or credential passthrough. Cannot access Unity Catalog data.

  • Single User: Can be used only by a single user (by default, the user who created the cluster). Other users cannot attach to the cluster. When accessing a view from a cluster with Single User security mode, the view is executed with the user’s permissions. Single-user clusters support workloads using Python, Scala, and R. Init scripts, library installation, and DBFS FUSE mounts are supported on single-user clusters. Automated jobs should use single-user clusters.

  • User Isolation: Can be shared by multiple users. Only SQL workloads are supported. Library installation, init scripts, and DBFS FUSE mounts are disabled to enforce strict isolation among the cluster users.

  • Table ACL only (Legacy): Enforces workspace-local table access control, but cannot access Unity Catalog data.

  • Passthrough only (Legacy): Enforces workspace-local credential passthrough, but cannot access Unity Catalog data.

The only security modes supported for Unity Catalog workloads are Single User and User Isolation.

Databricks SQL endpoints automatically use User Isolation, with no configuration required.

You can upgrade an existing cluster to meet the requirements of Unity Catalog by setting its cluster security mode to Single User or User Isolation security mode.

The following table describes the features that are enabled and disabled for each cluster security mode.

Note

The following table is wide. You may need to scroll sideways in your browser to view all columns.

Security mode

Supported languages

Unity Catalog

Legacy table access control

Credential passthrough

Multiple users

RDD API

DBFS FUSE mounts

Init scripts and library installation

Unity Catalog dynamic views

Databricks Runtime for Machine Learning

None

All

𝙓

𝙓

𝙓

𝙓

Single User

All

𝙓

𝙓

𝙓

𝙓

User Isolation

SQL

𝙓

𝙓

𝙓

𝙓

𝙓

Legacy table ACL

SQL, Python

𝙓

𝙓

𝙓

𝙓

𝙓

𝙓

Legacy passthrough

SQL, Python

𝙓

𝙓

𝙓

𝙓