Data object privileges
Preview
This feature is in Public Preview.
The Databricks data governance model lets you programmatically grant, deny, and revoke access to your data from Spark SQL. This model lets you control access to securable objects like tables, databases, views, and functions. It also allows for fine-grained access control (to a particular subset of a table, for example) by setting privileges on derived views created from arbitrary queries. The Databricks SQL query analyzer enforces these access control policies at runtime on clusters and SQL endpoints with table access control enabled.
This article describes the privileges, objects, and ownership rules that make up the Databricks data governance model. It also describes how to grant, deny, and revoke object privileges.
Requirements
The requirements for managing object privileges depends on your environment:
Databricks Workspace
- An administrator must enable and enforce table access control for the workspace.
- The cluster must be enabled for table access control.
Data governance model
This section describes the Databricks data governance model. Access to data objects is governed by privileges.
Privileges
SELECT
: gives read access to an object.CREATE
: gives ability to create an object (for example, a table in a database).MODIFY
: gives ability to add, delete, and modify data to or from an object.USAGE
: does not give any abilities, but is an additional requirement to perform any action on a database object.READ_METADATA
: gives ability to view an object and its metadata.CREATE_NAMED_FUNCTION
: gives ability to create a named UDF in an existing catalog or database.MODIFY_CLASSPATH
: gives ability to add files to the Spark class path.ALL PRIVILEGES
: gives all privileges (is translated into all the above privileges).
USAGE
privilege
To perform an action on a database object, a user must have the USAGE
privilege on that database in addition to the privilege to perform that action. Any one of the following satisfy the USAGE
requirement:
- Be an admin
- Have the
USAGE
privilege on the database or be in a group that has theUSAGE
privilege on the database - Have the
USAGE
privilege on theCATALOG
or be in a group that has the theUSAGE
privilege - Be the owner of the database or be in a group that owns the database
Even the owner of an object inside a database must have the USAGE
privilege in order to use it.
As an example, an administrator could define a finance
group and an accounting
database for them to use.
To set up a database that only the finance team can use and share, an admin would do the following:
CREATE DATABASE accounting;
GRANT USAGE ON DATABASE accounting TO finance;
GRANT CREATE ON DATABASE accounting TO finance;
With these privileges, members of the finance
group can create tables and views in the accounting
database,
but can’t share those tables or views with any principal that does not have USAGE
on the accounting
database.
Databricks Workspace and Databricks Runtime version behavior
- Clusters running Databricks Runtime 7.3 LTS and above enforce the
USAGE
privilege. - Clusters running Databricks Runtime 7.2 and below do not enforce the
USAGE
privilege. - To ensure that existing workloads function unchanged, in workspaces that used table access control before
USAGE
was introduced have had theUSAGE
privilege onCATALOG
granted to theusers
group. If you want to take advantage of theUSAGE
privilege, you must runREVOKE USAGE ON CATALOG FROM users
and thenGRANT USAGE ...
as needed.
Privilege hierarchy
SQL objects in Databricks are hierarchical and privileges are inherited. This means that granting
or denying a privilege on the CATALOG
automatically grants or denies the privilege to all databases
in the catalog. Similarly, privileges granted on a DATABASE
object are inherited by all objects in
that database.
Objects
Privileges apply to the following types of objects:
CATALOG
: controls access to the entire data catalog.DATABASE
: controls access to a database.TABLE
: controls access to a managed or external table.VIEW
: controls access to SQL views.FUNCTION
: controls access to a named function.
ANONYMOUS FUNCTION
: controls access to anonymous or temporary functions.ANY FILE
: controls access to the underlying filesystem.Note
Users with this privilege could bypass the restrictions put on tables by reading from the filesystem directly.
Object ownership
When table access control is enabled on a cluster or SQL endpoint, a user who creates a database, table, view, or function becomes its owner. The owner is granted all privileges and can grant privileges to other users.
Groups may own objects, in which case all members of that group are considered owners.
Ownership determines whether or not you can grant privileges on derived objects to other users.
For example, suppose user A owns table T and grants user B SELECT
privilege on table T. Even
though user B can select from table T, user B cannot grant SELECT
privilege on table T to user C,
because user A is still the owner of the underlying table T. Furthermore, user B cannot circumvent
this restriction simply by creating a view V on table T and granting privileges on that view to
user C. When Databricks checks for privileges for user C to access view V, it also checks that
the owner of V and underlying table T are the same. If the owners are not the same, user C must
also have SELECT
privileges on underlying table T.
When table access control is disabled on a cluster, no owner is registered when a database, table, view,
or function is created. To test if an object has an owner, run SHOW GRANT ON <object-name>
.
If you do not see an entry with ActionType OWN
, the object does not have an owner.
Assign owner to object
An administrator can assign an owner to an object using the ALTER <object> OWNER TO `<user-name>@<user-domain>.com`
command:
ALTER DATABASE <database-name> OWNER TO `<user-name>@<user-domain>.com`
ALTER TABLE <table-name> OWNER TO `group_name`
ALTER VIEW <view-name> OWNER TO `<user-name>@<user-domain>.com`
Users and groups
Administrators and owners can grant privileges to users and groups.
Each user is uniquely identified by their username in Databricks (which typically maps to their email address).
All users are implicitly a part of the “All Users” group, represented as users
in SQL.
Note
You must enclose user specifications in backticks (``), not single quotes (‘’).
Operations and privileges
In Databricks, admin users are allowed to do everything. They can grant, revoke, or deny privileges on any object, regardless of the object’s owner. They can also change the owner of any object, which can be necessary for objects created on non-table access controlled clusters. Owners of an object can perform any action and can also grant privileges on that object to other principals. The only exception is if the object is within a database; to do anything with that object the user must also have USAGE
on that database.
The following table maps SQL operations to the privileges required to perform that operation.
Note
- Any place where a privilege on a table, view, or function is required,
USAGE
is also required on the database it’s in. - In any place where a table is referenced in a command, a path could also be referenced. In those instances
SELECT
orMODIFY
is required onANY_FILE
instead ofUSAGE
on the database and another privilege on the table. - Object ownership is represented here as the
OWN
privilege.
Operation | Required Privileges |
---|---|
CLONE | Ability to SELECT from the table being cloned, CREATE on the database, and MODIFY if the a table is being replaced. |
COPY INTO | SELECT on ANY_FILE if copying from a path, MODIFY on the table being copied into. |
CREATE BLOOMFILTER INDEX | OWN on the table being indexed. |
CREATE DATABASE | CREATE on the CATALOG . |
CREATE TABLE | Either OWN or both USAGE and CREATE on the database. |
CREATE VIEW | Either OWN or both USAGE and CREATE on the database. |
CREATE FUNCTION | Either OWN or USAGE and CREATE_NAMED_FUNCTION on the database. If a resource is specified then MODIFY_CLASSPATH on CATALOG is also required. |
ALTER DATABASE | OWN on the database. |
ALTER TABLE | Usually OWN on the table. MODIFY if only adding or removing partitions. |
ALTER VIEW | OWN on the view. |
DROP BLOOMFILTER INDEX | OWN on the table. |
DROP DATABASE | OWN on the database. |
DROP TABLE | OWN on the table. |
DROP VIEW | OWN on the view. |
DROP FUNCTION | OWN on the function. |
EXPLAIN | READ_METADATA on the tables and views. |
DESCRIBE TABLE | READ_METADATA on the table. |
DESCRIBE HISTORY | OWN on the table. |
SELECT | SELECT on the table. |
INSERT | MODIFY on the table. |
RESTORE TABLE | MODIFY on the table. |
UPDATE | MODIFY on the table. |
MERGE INTO | MODIFY on the table. |
DELETE FROM | MODIFY on the table. |
TRUNCATE TABLE | MODIFY on the table. |
OPTIMIZE | MODIFY on the table. |
VACUUM | MODIFY on the table. |
FSCK REPAIR TABLE | MODIFY on the table. |
MSCK | OWN on the table. |
GRANT | OWN on the object. |
SHOW GRANT | OWN on the object. |
DENY | OWN on the object. |
REVOKE | OWN on the object. |
Important
When you use table access control, DROP TABLE
statements are case sensitive. If a table name is lower case and the DROP TABLE
references the table name using mixed or upper case, the DROP TABLE
statement will fail.
Manage object privileges
You use the GRANT
, DENY
, REVOKE
, MSCK
, and SHOW GRANT
operations to manage object privileges.
Note
An owner or an administrator of an object can perform
GRANT
,DENY
,REVOKE
, andSHOW GRANT
operations. However, an administrator cannot deny privileges to or revoke privileges from an owner.A principal that’s not an owner or administrator can perform an operation only if the required privilege has been granted.
To grant, deny, or revoke a privilege for all users, specify the keyword
users
afterTO
. For example,GRANT SELECT ON ANY FILE TO users
Examples
GRANT SELECT ON DATABASE <database-name> TO `<user>@<domain-name>`
GRANT SELECT ON ANONYMOUS FUNCTION TO `<user>@<domain-name>`
GRANT SELECT ON ANY FILE TO `<user>@<domain-name>`
SHOW GRANT `<user>@<domain-name>` ON DATABASE <database-name>
DENY SELECT ON <table-name> TO `<user>@<domain-name>`
REVOKE ALL PRIVILEGES ON DATABASE default FROM `<user>@<domain-name>`
REVOKE SELECT ON <table-name> FROM `<user>@<domain-name>`
GRANT SELECT ON ANY FILE TO users
Dynamic view functions
Databricks includes two user functions that allow you to express column- and row-level permissions dynamically in the body of a view definition.
current_user()
: return the current user name.is_member()
: determine if the current user is a member of a specific Databricks group.
Note
Available in Databricks Runtime 7.3 LTS and above. However, to use these functions in Databricks Runtime 7.3 LTS, you must set the Spark config spark.databricks.userInfoFunctions.enabled true
.
Consider the following example, which combines both functions to determine if a user has the appropriate group membership:
-- Return: true if the user is a member and false if they are not
SELECT
current_user as user,
-- Check to see if the current user is a member of the "Managers" group.
is_member("Managers") as admin
Allowing administrators to set fine granularity privileges for multiple users and groups within a single view is both expressive and powerful, while saving on administration overhead.
Column-level permissions
Through dynamic views it’s easy to limit what columns a specific group or user can see. Consider the following example where only users who belong to the auditors
group are able to see email addresses from the sales_raw
table. At analysis time Spark replaces the CASE
statement with either the literal 'REDACTED'
or the column email
. This behavior allows for all the usual performance optimizations provided by Spark.
-- Alias the field 'email' to itself (as 'email') to prevent the
-- permission logic from showing up directly in the column name results.
CREATE VIEW sales_redacted AS
SELECT
user_id,
CASE WHEN
is_member('auditors') THEN email
ELSE 'REDACTED'
END AS email,
country,
product,
total
FROM sales_raw
Row-level permissions
Using dynamic views you can specify permissions down to the row or field level. Consider the following example, where only users who belong to the managers
group are able to see transaction amounts (total
column) greater than $1,000,000.00:
CREATE VIEW sales_redacted AS
SELECT
user_id,
country,
product,
total
FROM sales_raw
WHERE
CASE
WHEN is_member('managers') THEN TRUE
ELSE total <= 1000000
END;
Data masking
As shown in the preceding examples, you can implement column-level masking to prevent users from seeing specific column data unless they are in the correct group. Because these views are standard Spark SQL, you can do more advanced types of masking with more complex SQL expressions. The following example lets all users perform analysis on email domains, but lets members of the auditors
group see users’ full email addresses.
-- The regexp_extract function takes an email address such as
-- user.x.lastname@example.com and extracts 'example', allowing
-- analysts to query the domain name
CREATE VIEW sales_redacted AS
SELECT
user_id,
region,
CASE
WHEN is_member('auditors') THEN email
ELSE regexp_extract(email, '^.*@(.*)$', 1)
END
FROM sales_raw
Frequently asked questions (FAQ)
How do I grant, deny, or revoke a privilege for all users
Specify the keyword users
after TO
or FROM
. For example:
GRANT SELECT ON TABLE database.table TO users
I created an object but now I can’t query, drop, or modify it.
This error can occur because you created that object on a cluster or SQL endpoint without table access control enabled. When table access control is disabled on a cluster or SQL endpoint, owners are not registered when a database, table, or view is created. An admin must assign an owner to the object using the following command:
ALTER [DATABASE | TABLE | VIEW] <object-name> OWNER TO `<user-name>@<user-domain>.com`;
How do I grant privileges on global and local temporary views?
Privileges on global and local temporary views are not supported. Local temporary
views are visible only within the same session, and views created in the global_temp
database are
visible to all users sharing a cluster or SQL endpoint. However, privileges on the underlying tables and views
referenced by any temporary views are enforced.
How do I grant a user or group privileges on multiple tables at once?
A grant, deny, or revoke statement can be applied to only one object at a time. The recommended way
to organize and grant privileges on multiple tables to a principal is via databases. Granting a
principal SELECT
privilege on a database implicitly grants that principal SELECT
privileges on
all tables and views in that database. For example, if a database D has tables t1 and t2, and an
admin issues the following GRANT
command:
GRANT USAGE, SELECT ON DATABASE D TO `<user>@<domain-name>`
The principal <user>@<domain-name>
can select from tables t1 and t2, as well as any tables and views created in database D in the future.
How do I grant a user privileges on all tables except one?
You grant SELECT
privilege to the database and then deny SELECT
privilege for the specific table you want to restrict access to.
GRANT USAGE, SELECT ON DATABASE D TO `<user>@<domain-name>`
DENY SELECT ON TABLE D.T TO `<user>@<domain-name>`
The principal <user>@<domain-name>
can select from all tables in D except D.T.
A user has SELECT
privileges on a view of table T, but when that user tries to SELECT
from that view, they get the error User does not have privilege SELECT on table
.
This common error can occur for one of the following reasons:
- Table T has no registered owner because it was created using a cluster or SQL endpoint for which table access control is disabled.
- The grantor of the
SELECT
privilege on a view of table T is not the owner of table T or the user does not also have selectSELECT
privilege on table T.
Suppose there is a table T owned by A. A owns view V1 on T and B owns view V2 on T.
- A user can select on V1 when A has granted
SELECT
privileges on view V1. - A user can select on V2 when A has granted
SELECT
privileges on table T and B has grantedSELECT
privileges on V2.
As described in the Object ownership section, these conditions ensure that only the owner of an object can grant other users access to that object.
I tried to run sc.parallelize
on a cluster with table access control enabled and it fails.
On clusters with table access control enabled you can use only the Spark SQL and Python DataFrame APIs. The RDD API is disallowed for security reasons, since Databricks does not have the ability to inspect and authorize code within an RDD.