Data object privileges

Preview

This feature is in Public Preview.

The Databricks data governance model lets you programmatically grant, deny, and revoke access to your data from Spark SQL. This model lets you control access to securable objects like tables, databases, views, and functions. It also allows for fine-grained access control (to a particular subset of a table, for example) by setting privileges on derived views created from arbitrary queries. The Databricks SQL query analyzer enforces these access control policies at runtime on clusters and SQL endpoints with table access control enabled.

This article describes the privileges, objects, and ownership rules that make up the Databricks data governance model. It also describes how to grant, deny, and revoke object privileges.

Requirements

The requirements for managing object privileges depends on your environment:

Databricks Workspace

Data governance model

This section describes the Databricks data governance model. Access to data objects is governed by privileges.

Privileges

  • SELECT: gives read access to an object.
  • CREATE: gives ability to create an object (for example, a table in a database).
  • MODIFY: gives ability to add, delete, and modify data to or from an object.
  • USAGE: does not give any abilities, but is an additional requirement to perform any action on a database object.
  • READ_METADATA: gives ability to view an object and its metadata.
  • CREATE_NAMED_FUNCTION: gives ability to create a named UDF in an existing catalog or database.
  • MODIFY_CLASSPATH: gives ability to add files to the Spark class path.
  • ALL PRIVILEGES: gives all privileges (is translated into all the above privileges).

USAGE privilege

To perform an action on a database object, a user must have the USAGE privilege on that database in addition to the privilege to perform that action. Any one of the following satisfy the USAGE requirement:

  • Be an admin
  • Have the USAGE privilege on the database or be in a group that has the USAGE privilege on the database
  • Have the USAGE privilege on the CATALOG or be in a group that has the the USAGE privilege
  • Be the owner of the database or be in a group that owns the database

Even the owner of an object inside a database must have the USAGE privilege in order to use it.

As an example, an administrator could define a finance group and an accounting database for them to use. To set up a database that only the finance team can use and share, an admin would do the following:

CREATE DATABASE accounting;
GRANT USAGE ON DATABASE accounting TO finance;
GRANT CREATE ON DATABASE accounting TO finance;

With these privileges, members of the finance group can create tables and views in the accounting database, but can’t share those tables or views with any principal that does not have USAGE on the accounting database.

Databricks Workspace and Databricks Runtime version behavior
  • Clusters running Databricks Runtime 7.3 LTS and above enforce the USAGE privilege.
  • Clusters running Databricks Runtime 7.2 and below do not enforce the USAGE privilege.
  • To ensure that existing workloads function unchanged, in workspaces that used table access control before USAGE was introduced have had the USAGE privilege on CATALOG granted to the users group. If you want to take advantage of the USAGE privilege, you must run REVOKE USAGE ON CATALOG FROM users and then GRANT USAGE ... as needed.

Privilege hierarchy

SQL objects in Databricks are hierarchical and privileges are inherited. This means that granting or denying a privilege on the CATALOG automatically grants or denies the privilege to all databases in the catalog. Similarly, privileges granted on a DATABASE object are inherited by all objects in that database.

Objects

Privileges apply to the following types of objects:

  • CATALOG: controls access to the entire data catalog.

    • DATABASE: controls access to a database.
      • TABLE: controls access to a managed or external table.
      • VIEW: controls access to SQL views.
      • FUNCTION: controls access to a named function.
  • ANONYMOUS FUNCTION: controls access to anonymous or temporary functions.

  • ANY FILE: controls access to the underlying filesystem.

    Note

    Users with this privilege could bypass the restrictions put on tables by reading from the filesystem directly.

Object ownership

When table access control is enabled on a cluster or SQL endpoint, a user who creates a database, table, view, or function becomes its owner. The owner is granted all privileges and can grant privileges to other users.

Groups may own objects, in which case all members of that group are considered owners.

Ownership determines whether or not you can grant privileges on derived objects to other users. For example, suppose user A owns table T and grants user B SELECT privilege on table T. Even though user B can select from table T, user B cannot grant SELECT privilege on table T to user C, because user A is still the owner of the underlying table T. Furthermore, user B cannot circumvent this restriction simply by creating a view V on table T and granting privileges on that view to user C. When Databricks checks for privileges for user C to access view V, it also checks that the owner of V and underlying table T are the same. If the owners are not the same, user C must also have SELECT privileges on underlying table T.

When table access control is disabled on a cluster, no owner is registered when a database, table, view, or function is created. To test if an object has an owner, run SHOW GRANT ON <object-name>. If you do not see an entry with ActionType OWN, the object does not have an owner.

Assign owner to object

An administrator can assign an owner to an object using the ALTER <object> OWNER TO `<user-name>@<user-domain>.com` command:

ALTER DATABASE <database-name> OWNER TO `<user-name>@<user-domain>.com`
ALTER TABLE <table-name> OWNER TO `group_name`
ALTER VIEW <view-name> OWNER TO `<user-name>@<user-domain>.com`

Users and groups

Administrators and owners can grant privileges to users and groups. Each user is uniquely identified by their username in Databricks (which typically maps to their email address). All users are implicitly a part of the “All Users” group, represented as users in SQL.

Note

You must enclose user specifications in backticks (``), not single quotes (‘’).

Operations and privileges

In Databricks, admin users are allowed to do everything. They can grant, revoke, or deny privileges on any object, regardless of the object’s owner. They can also change the owner of any object, which can be necessary for objects created on non-table access controlled clusters. Owners of an object can perform any action and can also grant privileges on that object to other principals. The only exception is if the object is within a database; to do anything with that object the user must also have USAGE on that database.

The following table maps SQL operations to the privileges required to perform that operation.

Note

  • Any place where a privilege on a table, view, or function is required, USAGE is also required on the database it’s in.
  • In any place where a table is referenced in a command, a path could also be referenced. In those instances SELECT or MODIFY is required on ANY_FILE instead of USAGE on the database and another privilege on the table.
  • Object ownership is represented here as the OWN privilege.
Operation Required Privileges
CLONE Ability to SELECT from the table being cloned, CREATE on the database, and MODIFY if the a table is being replaced.
COPY INTO SELECT on ANY_FILE if copying from a path, MODIFY on the table being copied into.
CREATE BLOOMFILTER INDEX OWN on the table being indexed.
CREATE DATABASE CREATE on the CATALOG.
CREATE TABLE Either OWN or both USAGE and CREATE on the database.
CREATE VIEW Either OWN or both USAGE and CREATE on the database.
CREATE FUNCTION Either OWN or USAGE and CREATE_NAMED_FUNCTION on the database. If a resource is specified then MODIFY_CLASSPATH on CATALOG is also required.
ALTER DATABASE OWN on the database.
ALTER TABLE Usually OWN on the table. MODIFY if only adding or removing partitions.
ALTER VIEW OWN on the view.
DROP BLOOMFILTER INDEX OWN on the table.
DROP DATABASE OWN on the database.
DROP TABLE OWN on the table.
DROP VIEW OWN on the view.
DROP FUNCTION OWN on the function.
EXPLAIN READ_METADATA on the tables and views.
DESCRIBE TABLE READ_METADATA on the table.
DESCRIBE HISTORY OWN on the table.
SELECT SELECT on the table.
INSERT MODIFY on the table.
RESTORE TABLE MODIFY on the table.
UPDATE MODIFY on the table.
MERGE INTO MODIFY on the table.
DELETE FROM MODIFY on the table.
TRUNCATE TABLE MODIFY on the table.
OPTIMIZE MODIFY on the table.
VACUUM MODIFY on the table.
FSCK REPAIR TABLE MODIFY on the table.
MSCK OWN on the table.
GRANT OWN on the object.
SHOW GRANT OWN on the object.
DENY OWN on the object.
REVOKE OWN on the object.

Important

When you use table access control, DROP TABLE statements are case sensitive. If a table name is lower case and the DROP TABLE references the table name using mixed or upper case, the DROP TABLE statement will fail.

Manage object privileges

You use the GRANT, DENY, REVOKE, MSCK, and SHOW GRANT operations to manage object privileges.

Note

  • An owner or an administrator of an object can perform GRANT, DENY, REVOKE, and SHOW GRANT operations. However, an administrator cannot deny privileges to or revoke privileges from an owner.

  • A principal that’s not an owner or administrator can perform an operation only if the required privilege has been granted.

  • To grant, deny, or revoke a privilege for all users, specify the keyword users after TO. For example,

    GRANT SELECT ON ANY FILE TO users
    

Examples

GRANT SELECT ON DATABASE <database-name> TO `<user>@<domain-name>`
GRANT SELECT ON ANONYMOUS FUNCTION TO `<user>@<domain-name>`
GRANT SELECT ON ANY FILE TO `<user>@<domain-name>`

SHOW GRANT `<user>@<domain-name>` ON DATABASE <database-name>

DENY SELECT ON <table-name> TO `<user>@<domain-name>`

REVOKE ALL PRIVILEGES ON DATABASE default FROM `<user>@<domain-name>`
REVOKE SELECT ON <table-name> FROM `<user>@<domain-name>`

GRANT SELECT ON ANY FILE TO users

Dynamic view functions

Databricks includes two user functions that allow you to express column- and row-level permissions dynamically in the body of a view definition.

  • current_user(): return the current user name.
  • is_member(): determine if the current user is a member of a specific Databricks group.

Note

Available in Databricks Runtime 7.3 LTS and above. However, to use these functions in Databricks Runtime 7.3 LTS, you must set the Spark config spark.databricks.userInfoFunctions.enabled true.

Consider the following example, which combines both functions to determine if a user has the appropriate group membership:

-- Return: true if the user is a member and false if they are not
SELECT
  current_user as user,
-- Check to see if the current user is a member of the "Managers" group.
  is_member("Managers") as admin

Allowing administrators to set fine granularity privileges for multiple users and groups within a single view is both expressive and powerful, while saving on administration overhead.

Column-level permissions

Through dynamic views it’s easy to limit what columns a specific group or user can see. Consider the following example where only users who belong to the auditors group are able to see email addresses from the sales_raw table. At analysis time Spark replaces the CASE statement with either the literal 'REDACTED' or the column email. This behavior allows for all the usual performance optimizations provided by Spark.

-- Alias the field 'email' to itself (as 'email') to prevent the
-- permission logic from showing up directly in the column name results.
CREATE VIEW sales_redacted AS
SELECT
  user_id,
  CASE WHEN
    is_member('auditors') THEN email
    ELSE 'REDACTED'
  END AS email,
  country,
  product,
  total
FROM sales_raw

Row-level permissions

Using dynamic views you can specify permissions down to the row or field level. Consider the following example, where only users who belong to the managers group are able to see transaction amounts (total column) greater than $1,000,000.00:

CREATE VIEW sales_redacted AS
SELECT
  user_id,
  country,
  product,
  total
FROM sales_raw
WHERE
  CASE
    WHEN is_member('managers') THEN TRUE
    ELSE total <= 1000000
  END;

Data masking

As shown in the preceding examples, you can implement column-level masking to prevent users from seeing specific column data unless they are in the correct group. Because these views are standard Spark SQL, you can do more advanced types of masking with more complex SQL expressions. The following example lets all users perform analysis on email domains, but lets members of the auditors group see users’ full email addresses.

-- The regexp_extract function takes an email address such as
-- user.x.lastname@example.com and extracts 'example', allowing
-- analysts to query the domain name

CREATE VIEW sales_redacted AS
SELECT
  user_id,
  region,
  CASE
    WHEN is_member('auditors') THEN email
    ELSE regexp_extract(email, '^.*@(.*)$', 1)
  END
  FROM sales_raw

Frequently asked questions (FAQ)

How do I grant, deny, or revoke a privilege for all users

Specify the keyword users after TO or FROM. For example:

GRANT SELECT ON TABLE database.table TO users

I created an object but now I can’t query, drop, or modify it.

This error can occur because you created that object on a cluster or SQL endpoint without table access control enabled. When table access control is disabled on a cluster or SQL endpoint, owners are not registered when a database, table, or view is created. An admin must assign an owner to the object using the following command:

ALTER [DATABASE | TABLE | VIEW] <object-name> OWNER TO `<user-name>@<user-domain>.com`;

How do I grant privileges on global and local temporary views?

Privileges on global and local temporary views are not supported. Local temporary views are visible only within the same session, and views created in the global_temp database are visible to all users sharing a cluster or SQL endpoint. However, privileges on the underlying tables and views referenced by any temporary views are enforced.

How do I grant a user or group privileges on multiple tables at once?

A grant, deny, or revoke statement can be applied to only one object at a time. The recommended way to organize and grant privileges on multiple tables to a principal is via databases. Granting a principal SELECT privilege on a database implicitly grants that principal SELECT privileges on all tables and views in that database. For example, if a database D has tables t1 and t2, and an admin issues the following GRANT command:

GRANT USAGE, SELECT ON DATABASE D TO `<user>@<domain-name>`

The principal <user>@<domain-name> can select from tables t1 and t2, as well as any tables and views created in database D in the future.

How do I grant a user privileges on all tables except one?

You grant SELECT privilege to the database and then deny SELECT privilege for the specific table you want to restrict access to.

GRANT USAGE, SELECT ON DATABASE D TO `<user>@<domain-name>`
DENY SELECT ON TABLE D.T TO `<user>@<domain-name>`

The principal <user>@<domain-name> can select from all tables in D except D.T.

A user has SELECT privileges on a view of table T, but when that user tries to SELECT from that view, they get the error User does not have privilege SELECT on table.

This common error can occur for one of the following reasons:

  • Table T has no registered owner because it was created using a cluster or SQL endpoint for which table access control is disabled.
  • The grantor of the SELECT privilege on a view of table T is not the owner of table T or the user does not also have select SELECT privilege on table T.

Suppose there is a table T owned by A. A owns view V1 on T and B owns view V2 on T.

  • A user can select on V1 when A has granted SELECT privileges on view V1.
  • A user can select on V2 when A has granted SELECT privileges on table T and B has granted SELECT privileges on V2.

As described in the Object ownership section, these conditions ensure that only the owner of an object can grant other users access to that object.

I tried to run sc.parallelize on a cluster with table access control enabled and it fails.

On clusters with table access control enabled you can use only the Spark SQL and Python DataFrame APIs. The RDD API is disallowed for security reasons, since Databricks does not have the ability to inspect and authorize code within an RDD.