Data permissions

This article explain how data permissions work to control access to data and objects in Unity Catalog.

You can use data access control policies to grant and revoke access to Unity Catalog data and objects in the Databricks SQL Data Explorer, SQL statements in notebooks or Databricks SQL queries, or using the Unity Catalog REST API.

Initially, users have no access to data in a metastore. Only metastore admins can create schemas, tables, views, and other Unity Catalog objects and grant or revoke access on them to account-level users or groups. Access control policies are not inherited. The account-level admin who creates a metastore is its owner and metastore admin.

Access control policies are applied by Unity Catalog before data can be read or written to your cloud tenant.

Security notes

SECURITY NOTE #1

Databricks Runtime uses temporary credentials to access Unity Catalog External Locations and Tables. Credentials can be obtained only if the user has been granted access to the table or location. However, once obtained, the credentials have a lifetime of up to 1 hour.

The consequence of this process is that all permission changes after generating the credentials are not immediately reflected and the credentials remain valid with their old permission scope until they expire.

Locations of recently created External Tables will remain accessible by users who had access to the External Location that contains the table and had already acquired credentials for that location.

SECURITY NOTE #2

When running a Create Table as SELECT (CTAS) command, the catalog entry in Unity Catalog that is used to govern access to this table is created at the end of the transaction, after all data has been written to the target location. For the duration of the CTAS command, the already written data is accessible to all users with appropriate access to the External Location, for at least the duration of their temporary access token independent of their access privileges to the new created table. This behavior is only observed for tables created on External Locations. Temporary data created by a canceled CTAS command will remain accessible until cleaned up by a user.

RECOMMENDATION

We advise administrators to always use managed Unity Catalog tables to avoid any of the above issues. Administrators should limit access to External Locations to the strictly required users. To isolate sensitive data, admins can create multiple External Locations and grant access to different sets of users.

Ownership

Each securable object in Unity Catalog has an owner. The owner can be any account-level user or group, called a principal. The principal that creates an object becomes its initial owner. An object’s owner has all privileges on the object, such as SELECT and MODIFY on a table, as well as the permission to grant privileges to other principals.

The object’s owner can transfer ownership to another user or group. A metastore admin can transfer ownership of any object in the metastore to another user or group.

To see the owner of a securable object, use the following syntax. Replace the placeholder values:

  • <SECURABLE_TYPE>: The type of securable, such as CATALOG or TABLE.

  • <catalog>: The parent catalog for a table or view.

  • <schema>: The parent schema for a table or view.

  • <securable_name>: The name of the securable, such as a table or view.

DESCRIBE <SECURABLE_TYPE> EXTENDED <catalog>.<schema>.<securable_name>;
display(spark.sql("DESCRIBE <SECURABLE_TYPE> EXTENDED <catalog>.<schema>.<securable_name>"))
library(SparkR)

display(sql("DESCRIBE <SECURABLE_TYPE> EXTENDED <catalog>.<schema>.<securable_name>"))
display(spark.sql("DESCRIBE <SECURABLE_TYPE> EXTENDED <catalog>.<schema>.<securable_name>"))

To transfer ownership of an object, use a SQL command with the following syntax. Replace the placeholder values:

  • <SECURABLE_TYPE>: The type of securable, such as CATALOG or TABLE.

  • <SECURABLE_NAME>: The name of the securable.

  • <PRINCIPAL>: The email address of an account-level user or the name of an account-level group.

ALTER <SECURABLE_TYPE> <SECURABLE_NAME> OWNER TO <PRINCIPAL>;
spark.sql("ALTER <SECURABLE_TYPE> <SECURABLE_NAME> OWNER TO <PRINCIPAL>")
library(SparkR)

sql("ALTER <SECURABLE_TYPE> <SECURABLE_NAME> OWNER TO <PRINCIPAL>")
spark.sql("ALTER <SECURABLE_TYPE> <SECURABLE_NAME> OWNER TO <PRINCIPAL>")

For example, to transfer ownership of a table to the accounting group:

ALTER TABLE orders OWNER TO `accounting`;
spark.sql("ALTER TABLE orders OWNER TO `accounting`")
library(SparkR)

sql("ALTER TABLE orders OWNER TO `accounting`")
spark.sql("ALTER TABLE orders OWNER TO `accounting`")

Ownership of a metastore

The account-level admin who creates a metastore is its initial owner and metastore admin. To transfer ownership of a metastore to a different user or group, see (Recommended) Sync account-level identities from your IdP. For added security, this command is not available using SQL syntax.

Privileges

In Unity Catalog, you can grant the following privileges on a securable object:

  • USAGE: This privilege does not grant access to the securable itself, but allows the grantee to traverse the securable in order to access its child objects. For example, to select data from a table, users need to have the SELECT privilege on that table and USAGE privileges on its parent schema and parent catalog. Thus, you can use this privilege to restrict access to sections of your data namespace to specific groups.

  • SELECT: Allows a user to select from a table or view, if the user also has USAGE on its parent catalog and schema.

  • MODIFY: Allows the grantee to add, update and delete data to or from the securable if the user also has USAGE on its parent catalog and schema.

  • CREATE: Allows a user to create a schema if the user also has USAGE and CREATE permissions on its parent catalog. Allows a user to create a table or view if the user also has USAGE on its parent catalog and schema and the CREATE permission on the schema..

In addition, you can grant the following privileges on storage credentials and external locations.

  • CREATE TABLE: Allows a user to create external tables directly in your cloud tenant using a storage credential.

  • READ FILES: When granted on an external location, allows a user to read files directly from your cloud tenant using the storage credential associated with the external location.

    When granted directly on a storage credential, allows a user to read files directly from your cloud tenant using the storage credential.

  • WRITE FILES: When granted on an external location, allows a user to write files directly to your cloud tenant using the storage credential associated with the external location.

    When granted directly on a storage credential, allows a user to write files directly to your cloud tenant using the storage credential.

Note

Although you can grant READ FILE and WRITE FILE privileges on a storage credential, Databricks recommends that you instead grant these privileges on an external location. This allows you to manage permissions at a more granular level and provides a simpler experience to end users.

In Unity Catalog, privileges are not inherited on child securable objects. For example, if you grant the CREATE privilege on a catalog to a user, the user does not automatically have the CREATE privilege on all databases in the catalog.

The following table summarizes the privileges that can be granted on each securable object:

Securable

Privileges

Catalog

CREATE, USAGE

Schema

CREATE, USAGE

Table

SELECT, MODIFY

View

SELECT

External location

CREATE TABLE, READ FILES, WRITE FILES

Storage credential

CREATE TABLE, READ FILES, WRITE FILES

Manage privileges

You can manage privileges for metastore objects in the Databricks SQL data explorer or by using SQL commands in the Databricks SQL editor or a Data Science & Engineering notebook.

To manage privileges, you use GRANT and REVOKE statements. Only an object’s owner, an account admin, or a metastore admin can grant privileges on the object and its descendent objects. A built-in group called account users includes all users.

This section contains examples of using SQL commands to manage privileges. To manage privileges using the Databricks SQL Data Explorer, see Manage access to data.

You can also manage privileges by using the Databricks Terraform provider and databricks_grants.

Show grants on a securable object

To show grants on an object using SQL, use a command like the following. To use the Databricks SQL Data Explorer, see Use the Databricks SQL Data Explorer.

Use the following syntax. Replace the placeholder values:

  • <securable_type>: The type of the securable object, such as catalog or table.

  • <securable_name>: The name of the securable object.

SHOW GRANTS ON <securable_type> <securable_name>;
display(spark.sql("SHOW GRANTS ON <securable_type> <securable_name>"))
library(SparkR)

display(sql("SHOW GRANTS ON <securable_type> <securable_name>"))
display(spark.sql("SHOW GRANTS ON <securable_type> <securable_name>"))

To show all grants for a given principal on an object, use the following syntax. Replace the placeholder values:

  • <principal>: The email address of an account-level user or the name of an account-level group.

  • <securable_type>: The type of the securable object, such as catalog or table.

  • <securable_name>: The name of the securable object.

SHOW GRANTS <principal> ON <securable_type> <securable_name>;
display(spark.sql("SHOW GRANTS <principal> ON <securable_type> <securable_name>"))
library(SparkR)

display(sql("SHOW GRANTS <principal> ON <securable_type> <securable_name>"))
display(spark.sql("SHOW GRANTS <principal> ON <securable_type> <securable_name>"))

Grant a privilege

To grant a privilege using SQL, use a command like the following. To use the Databricks SQL Data Explorer, see Use the Databricks SQL Data Explorer.

Use the following syntax. Replace the placeholder values:

  • <privilege>: The privilege to grant, such as SELECT or USAGE.

  • <securable_type>: The type of the securable object, such as catalog or table.

  • <securable_name>: The name of the securable object.

  • <principal>: The email address of an account-level user or the name of an account-level group.

GRANT <privilege> ON <securable_type> <securable_name> TO <principal>
spark.sql("GRANT <privilege> ON <securable_type> <securable_name> TO <principal>")
library(SparkR)

sql("GRANT <privilege> ON <securable_type> <securable_name> TO <principal>")
spark.sql("GRANT <privilege> ON <securable_type> <securable_name> TO <principal>")

You can also grant privileges by using the Databricks Terraform provider and databricks_grants.

Revoke a privilege

To revoke a privilege using SQL, use a command like the following. To use the Databricks SQL Data Explorer, see Use the Databricks SQL Data Explorer.

Use the following syntax. Replace the placeholder values:

  • <privilege>: The privilege to grant, such as SELECT or USAGE.

  • <securable_type>: The type of the securable object, such as catalog or table.

  • <securable_name>: The name of the securable object.

  • <principal>: The email address of an account-level user or the name of an account-level group.

REVOKE <privilege> ON <securable_type> <securable_name> FROM <principal>
spark.sql("REVOKE <privilege> ON <securable_type> <securable_name> FROM <principal>")
library(SparkR)

sql("REVOKE <privilege> ON <securable_type> <securable_name> FROM <principal>")
spark.sql("REVOKE <privilege> ON <securable_type> <securable_name> FROM <principal>")

Dynamic views

In Unity Catalog, you can use dynamic views to configure fine-grained access control, including:

  • Security at the level of columns or rows.

  • Data masking.

Note

Fine-grained access control using dynamic views are not available on clusters with Single User security mode.

Unity Catalog introduces the following functions, which allow you to dynamically limit which users can access a row, column, or record in a view:

  • current_user(): Returns the current user’s email address.

  • is_account_group_member(): Returns TRUE if the current user is a member of a specific account-level group. Recommended for use in dynamic views against Unity Catalog data.

  • is_member(): Returns TRUE if the current user is a member of a specific workspace-level group. This function is provided for compatibility with the existing Hive metastore. Avoid using it with views against Unity Catalog data, because it does not evaluate account-level group membership.

The following examples illustrate how to create dynamic views in Unity Catalog.

Column-level permissions

With a dynamic view, you can limit the columns a specific user or group can access. In the following example, only members of the auditors group can access email addresses from the sales_raw table. During query analysis, Apache Spark replaces the CASE statement with either the literal string REDACTED or the actual contents of the email address column. Other columns are returned as normal. This strategy has no negative impact on the query performance.

-- Alias the field 'email' to itself (as 'email') to prevent the
-- permission logic from showing up directly in the column name results.
CREATE VIEW sales_redacted AS
SELECT
  user_id,
  CASE WHEN
    is_account_group_member('auditors') THEN email
    ELSE 'REDACTED'
  END AS email,
  country,
  product,
  total
FROM sales_raw
# Alias the field 'email' to itself (as 'email') to prevent the
# permission logic from showing up directly in the column name results.
spark.sql("CREATE VIEW sales_redacted AS "
  "SELECT "
  "  user_id, "
  "  CASE WHEN "
  "    is_account_group_member('auditors') THEN email "
  "  ELSE 'REDACTED' "
  "  END AS email, "
  "  country, "
  "  product, "
  "  total "
  "FROM sales_raw")
library(SparkR)

# Alias the field 'email' to itself (as 'email') to prevent the
# permission logic from showing up directly in the column name results.
sql(paste("CREATE VIEW sales_redacted AS ",
  "SELECT ",
  "  user_id, ",
  "  CASE WHEN ",
  "    is_account_group_member('auditors') THEN email ",
  "  ELSE 'REDACTED' ",
  "  END AS email, ",
  "  country, ",
  "  product, ",
  "  total ",
  "FROM sales_raw",
  sep = ""))
// Alias the field 'email' to itself (as 'email') to prevent the
// permission logic from showing up directly in the column name results.
spark.sql("CREATE VIEW sales_redacted AS " +
  "SELECT " +
  "  user_id, " +
  "  CASE WHEN " +
  "    is_account_group_member('auditors') THEN email " +
  "  ELSE 'REDACTED' " +
  "  END AS email, " +
  "  country, " +
  "  product, " +
  "  total " +
  "FROM sales_raw")

Row-level permissions

With a dynamic view, you can specify permissions down to the row or field level. In the following example, only members of the managers group can view transaction amounts when they exceed $1,000,000. Matching results are filtered out for other users.

 CREATE VIEW sales_redacted AS
 SELECT
   user_id,
   country,
   product,
   total
 FROM sales_raw
 WHERE
   CASE
     WHEN is_account_group_member('managers') THEN TRUE
     ELSE total <= 1000000
   END;
 spark.sql("CREATE VIEW sales_redacted AS "
   "SELECT "
   "  user_id, "
   "  country, "
   "  product, "
   "  total "
   "FROM sales_raw "
   "WHERE "
   "CASE "
   "  WHEN is_account_group_member('managers') THEN TRUE "
   "  ELSE total <= 1000000 "
   "END")
 library(SparkR)

 sql(paste("CREATE VIEW sales_redacted AS ",
   "SELECT ",
   "  user_id, ",
   "  country, ",
   "  product, ",
   "  total ",
   "FROM sales_raw ",
   "WHERE ",
   "CASE ",
   "  WHEN is_account_group_member('managers') THEN TRUE ",
   "  ELSE total <= 1000000 ",
   "END",
   sep = ""))
 spark.sql("CREATE VIEW sales_redacted AS " +
   "SELECT " +
   "  user_id, " +
   "  country, " +
   "  product, " +
   "  total " +
   "FROM sales_raw " +
   "WHERE " +
   "CASE " +
   "  WHEN is_account_group_member('managers') THEN TRUE " +
   "  ELSE total <= 1000000 " +
   "END")

Data masking

Because views in Unity Catalog use Spark SQL, you can implement advanced data masking by using more complex SQL expressions and regular expressions. In the following example, all users can analyze email domains, but only members of the auditors group can view a user’s entire email address.

-- The regexp_extract function takes an email address such as
-- user.x.lastname@example.com and extracts 'example', allowing
-- analysts to query the domain name.

CREATE VIEW sales_redacted AS
SELECT
  user_id,
  region,
  CASE
    WHEN is_account_group_member('auditors') THEN email
    ELSE regexp_extract(email, '^.*@(.*)$', 1)
  END
  FROM sales_raw
# The regexp_extract function takes an email address such as
# user.x.lastname@example.com and extracts 'example', allowing
# analysts to query the domain name.

spark.sql("CREATE VIEW sales_redacted AS "
  "SELECT "
  "  user_id, "
  "  region, "
  "  CASE "
  "    WHEN is_account_group_member('auditors') THEN email "
  "    ELSE regexp_extract(email, '^.*@(.*)$', 1) "
  "  END "
  "  FROM sales_raw")
library(SparkR)

# The regexp_extract function takes an email address such as
# user.x.lastname@example.com and extracts 'example', allowing
# analysts to query the domain name.

sql(paste("CREATE VIEW sales_redacted AS ",
  "SELECT ",
  "  user_id, ",
  "  region, ",
  "  CASE ",
  "    WHEN is_account_group_member('auditors') THEN email ",
  "    ELSE regexp_extract(email, '^.*@(.*)$', 1) ",
  "  END ",
  "  FROM sales_raw",
  sep = ""))
// The regexp_extract function takes an email address such as
// user.x.lastname@example.com and extracts 'example', allowing
// analysts to query the domain name.

spark.sql("CREATE VIEW sales_redacted AS " +
  "SELECT " +
  "  user_id, " +
  "  region, " +
  "  CASE " +
  "    WHEN is_account_group_member('auditors') THEN email " +
  "    ELSE regexp_extract(email, '^.*@(.*)$', 1) " +
  "  END " +
  "  FROM sales_raw")

Working with Unity Catalog and the legacy Hive metastore

The Unity Catalog metastore is additive, meaning it can be used with the per-workspace Hive metastore in Databricks. The Hive metastore appears as a top-level catalog called hive_metastore in the three-level namespace. For example, you can refer to a table called sales_raw in the sales schema in the legacy Hive metastore by using the following notation:

SELECT * from hive_metastore.sales.sales_raw;
display(spark.table("hive_metastore.sales.sales_raw"))
library(SparkR)

display(tableToDF("hive_metastore.sales.sales_raw"))
display(spark.table("hive_metastore.sales.sales_raw"))

You can also specify the catalog and schema with a USE statement:

USE hive_metastore.sales;
SELECT * from sales_raw;
spark.sql("USE hive_metastore.sales")
display(spark.table("sales_raw"))
library(SparkR)

sql("USE hive_metastore.sales")
display(tableToDF("sales_raw"))
spark.sql("USE hive_metastore.sales")
display(spark.table("sales_raw"))

Access control in Unity Catalog and the Hive metastore

If you configured table access control on the Hive metastore, Databricks will continue enforcing those access controls for data in the hive_metastore catalog for clusters running in the User Isolation, or Table ACL only (Legacy) security mode. The Unity Catalog access model differs slightly from legacy access controls, like no DENY statements. The Hive metastore is a workspace-level object. Permissions defined within the hive_metastore catalog always refer to the local users and groups in the workspace. See Differences from table access control.

If table access control is enabled on the Hive metastore, workloads must use clusters with User Isolation enabled. SQL warehouses always use this security mode.

Differences from table access control

Unity Catalog has the following key differences from using table access controls in the legacy Hive metastore in each workspace.

The access control model in Unity Catalog has the following differences from table access control:

  • Account-level identities: Access control policies in Unity Catalog are applied to account-level users and groups, while access control policies for the Hive metastore are applied to workspace-level users and groups.

  • Simplified Privileges: Unity Catalog has only four privileges (SELECT, MODIFY, CREATE, USAGE) for catalogs, schemas, tables, and views, and three privileges specific to external locations and storage credentials (READ_FILES, WRITE_FILES, CREATE TABLE). Unity Catalog has no DENY statements. Instead, access is always denied until it is explicitly granted. Legacy privileges like READ_METADATA and MODIFY_CLASSPATH are not supported.

  • No inheritance of privileges: In Unity Catalog, privileges on a parent object are not inherited by its child objects.

  • USAGE permission is required on the catalog for all operations on objects inside the catalog: Regardless of a principal’s privileges on a table or database, the principal must also have the USAGE privilege on its parent catalog to access the table or database. With workspace-level table access controls, granting USAGE on the root catalog automatically grants USAGE on all databases, but USAGE on the root catalog is not required.

  • Views: In Unity Catalog, the owner of a view does not need to be an owner of the view’s referenced tables and views. Having the SELECT privilege is sufficient, along with USAGE on the views’ parent schema and catalog. With workspace-level table access controls, a view’s owner needs to be an owner of all referenced tables and views.

  • No support for ALL FILES or ANONYMOUS FUNCTIONs: In Unity Catalog, there is no concept of an ALL FILES or ANONYMOUS FUNCTION permission. These permissions could be used to circumvent access control restrictions by allowing an unprivileged user to run privileged code.

Joins

By using three-level namespace notation, you can join data in a Unity Catalog metastore with data in the legacy Hive metastore.

Note

A join with data in the legacy Hive metastore will only work on the workspace where that data resides. Trying to run such a join in another workspace results in an error. Databricks recommends that you upgrade legacy tables and views to Unity Catalog.

The following example joins results from the sales_current table in the legacy Hive metastore with the sales_historical table in the Unity Catalog metastore when the order_id fields are equal.

SELECT * FROM hive_metastore.sales.sales_current
JOIN main.shared_sales.sales_historical
ON hive_metastore.sales.sales_current.order_id = main.shared_sales.sales_historical.order_id;
dfCurrent = spark.table("hive_metastore.sales.sales_current")
dfHistorical = spark.table("main.shared_sales.sales_historical")

display(dfCurrent.join(
  other = dfHistorical,
  on = dfCurrent.order_id == dfHistorical.order_id
))
library(SparkR)

dfCurrent = tableToDF("hive_metastore.sales.sales_current")
dfHistorical = tableToDF("main.shared_sales.sales_historical")

display(join(
  x = dfCurrent,
  y = dfHistorical,
  joinExpr = dfCurrent$order_id == dfHistorical$order_id))
val dfCurrent = spark.table("hive_metastore.sales.sales_current")
val dfHistorical = spark.table("main.shared_sales.sales_historical")

display(dfCurrent.join(
  right = dfHistorical,
  joinExprs = dfCurrent("order_id") === dfHistorical("order_id")
))

The following example expresses a more complex join. It returns the sum of current and historical sales by customer, assuming that each table contains at least one row for each customer.

SELECT current.customer_id, current.customer_name, COALESCE(current.total, 0) + COALESCE(historical.total, 0) AS total
  FROM hive_metastore.sales.sales_current AS current
  FULL OUTER JOIN main.shared_sales.sales_historical AS historical
  ON current.customer_id = historical.customer_id;
from pyspark.sql.functions import coalesce

dfCurrent = spark.table("hive_metastore.sales.sales_current")
dfHistorical = spark.table("main.shared_sales.sales_historical")

dfJoin = dfCurrent.join(
  other = dfHistorical,
  on = dfCurrent.customer_id == dfHistorical.customer_id,
  how = "full_outer"
)

display(dfJoin.select(
  dfCurrent.customer_id,
  dfCurrent.customer_name,
  (coalesce(dfCurrent.total) + coalesce(dfHistorical.total)).alias("total")))
library(SparkR)

dfCurrent = tableToDF("hive_metastore.sales.sales_current")
dfHistorical = tableToDF("main.shared_sales.sales_historical")

dfJoin = join(
  x = dfCurrent,
  y = dfHistorical,
  joinExpr = dfCurrent$customer_id == dfHistorical$customer_id,
  joinType = "fullouter")

display(
  select(
    x = dfJoin,
    col = list(
      dfCurrent$customer_id,
      dfCurrent$customer_name,
      alias(coalesce(dfCurrent$total) + coalesce(dfHistorical$total), "total")
    )
  )
)
import org.apache.spark.sql.functions.coalesce

val dfCurrent = spark.table("hive_metastore.sales.sales_current")
val dfHistorical = spark.table("main.shared_sales.sales_historical")

val dfJoin = dfCurrent.join(
  right = dfHistorical,
  joinExprs = dfCurrent("customer_id") === dfHistorical("customer_id"),
  joinType = "full_outer"
)

display(dfJoin.select(
  cols = dfCurrent("customer_id"),
         dfCurrent("customer_name"),
         (coalesce(dfCurrent("total")) + coalesce(dfHistorical("total")).alias("total"))
))

Default catalog

If you omit the top-level catalog name and there is no USE CATALOG statement, the default catalog is assumed. To configure the default catalog for a workspace, set the spark.databricks.sql.initial.catalog.name value.

Databricks recommends setting the default catalog value to hive_metastore so that your existing code can operate on current Hive metastore data without any change.

Cluster instance profile

When using the Hive metastore alongside Unity Catalog, the instance profile on the cluster is used to access Hive Metastore data but not the data in Unity Catalog. Unity Catalog does not rely on the instance profile configured for a cluster.

Upgrade legacy tables to Unity Catalog

Tables in the Hive metastore do not benefit from the full set of security and governance features that Unity Catalog introduces, such as built-in auditing and access control. Databricks recommends that you upgrade your legacy tables by adding them to Unity Catalog.

Next steps