Access control requires the Premium plan (or, for customers who subscribed to Databricks before March 3, 2020, the Operational Security package).
The Databricks data governance model lets you programmatically grant, deny, and revoke access to your data from the Spark SQL API. This model lets you control access to securable objects like tables, databases, views, and functions. It also allows for fine-grained access control (to a particular subset of a table, for example) by setting privileges on derived views created from arbitrary queries. The Databricks SQL query analyzer enforces these access control policies at runtime on clusters with table access control enabled.
This article describes the privileges, objects, and ownership rules that make up the Databricks data governance model. It also describes how to grant, deny, and revoke object privileges.
Before an administrator or object owner can grant, deny, or revoke privileges on data objects, an administrator must enable table access control for the cluster. For information about how to create a cluster that follows this governance model, see Enable table access control for your workspace.
This section describes the Databricks data governance model. Access to data objects is governed by privileges.
SELECT– gives read access to an object
CREATE– gives ability to create an object (for example, a table in a database)
MODIFY– gives ability to add, delete, and modify data to or from an object
READ_METADATA– gives ability to view an object and its metadata
CREATE_NAMED_FUNCTION– gives ability to create a named UDF in an existing catalog or database
ALL PRIVILEGES– gives all privileges (is translated into all the above privileges)
Privileges on objects are hierarchical. This means that granting or denying a privilege on the
object automatically grants or denies the privilege to all of its contained databases (as well as
all tables and views). Similarly, granting or denying a privilege to a
automatically grants or denies the privilege to all tables and views in that database.
Privileges can apply to the following classes of objects:
CATALOG- controls access to the entire data catalog.
DATABASE- controls access to a database.
TABLE- controls access to a managed or external table.
VIEW- controls access to SQL views.
FUNCTION- controls access to a named function.
ANONYMOUS FUNCTION- controls access to anonymous or temporary functions.
ANY FILE- controls access to the underlying filesystem.
When table ACLs are enabled on a cluster, a user who creates a database, table, view, or function becomes its owner. The owner is granted all privileges and can grant privileges to other users.
Ownership determines whether or not you can grant privileges on derived objects to other users.
For example, suppose user A owns table T and grants user B
SELECT privilege on table T. Even
though user B can select from table T, user B cannot grant
SELECT privilege on table T to user C,
because user A is still the owner of the underlying table T. Furthermore, user B cannot circumvent
this restriction simply by creating a view V on table T and granting privileges on that view to
user C. When Databricks checks for privileges for user C to access view V, it also checks that
the owner of V and underlying table T are the same. If the owners are not the same, user C must
SELECT privileges on underlying table T.
When table ACLs are disabled on a cluster, an owner is not registered when a database, table, view,
or function is created. To test if an object has an owner, run
SHOW GRANT ON <object-name>.
If you do not see an entry with
ActionType OWN, the object does not have an owner.
An administrator can assign an owner to an object using the
ALTER <object> OWNER TO `<user-name>@<user-domain>.com` command:
ALTER DATABASE <database-name> OWNER TO `<user-name>@<user-domain>.com` ALTER TABLE <table-name> OWNER TO `<user-name>@<user-domain>.com` ALTER VIEW <view-name> OWNER TO `<user-name>@<user-domain>.com`
Administrators and owners can grant privileges to users and groups created using the Groups API. Each user is uniquely identified by their username in Databricks (which typically maps to their email address).
You must enclose user specifications in backticks (``), not single quotes (‘’).
The following table maps SQL operations to the privilege or role required to perform that operation:
|Operation / Privilege or role||
|FSCK REPAIR TABLE||x||x||x|
When using table ACL,
DROP TABLE statements are case sensitive. If a table name is lower case and the drop table references the table name using mixed or upper case, the
DROP TABLE statement will fail.
You use the
REVOKE operations to manage object privileges.
- An owner or an administrator of an object can perform
REVOKEoperations. However, an administrator cannot deny privileges to or revoke privileges from an owner.
- A principal that’s not an owner or administrator can perform an operation only if the required privilege has been granted.
- To grant, deny, or revoke a privilege for all users, specify the keyword
GRANT SELECT ON DATABASE <database-name> TO `<user>@<domain-name>` GRANT SELECT ON ANONYMOUS FUNCTION TO `<user>@<domain-name>` GRANT SELECT ON ANY FILE TO `<user>@<domain-name>` SHOW GRANT `<user>@<domain-name>` ON DATABASE <database-name> DENY SELECT ON <table-name> TO `<user>@<domain-name>` REVOKE ALL PRIVILEGES ON DATABASE default FROM `<user>@<domain-name>` REVOKE SELECT ON <table-name> FROM `<user>@<domain-name>` GRANT SELECT ON ANY FILE TO users
How do I grant, deny, or revoke a privilege for all users
Specify the keyword
FROM. For example:
GRANT SELECT ON ANY FILE TO users
I created an object but now I can’t query, drop, or modify it.
This error can occur because you created that object on a cluster without table ACLs enabled. When table ACLs are disabled on a cluster, owners are not registered when a database, table, or view is created. An admin must assign an owner to the object using the following command:
ALTER [DATABASE|TABLE|VIEW] <object-name> OWNER TO `<user-name>@<user-domain>.com`;
How do I grant privileges on global and local temporary views?
Unfortunately privileges on global and local temporary views are not supported. Local temporary
views are visible only within the same session, and views created in the
global_temp database are
visible to all users sharing a cluster. However, privileges on the underlying tables and views
referenced by any temporary views are enforced.
How do I grant a user or group privileges on multiple tables at once?
A grant, deny, or revoke statement can be applied to only one object at a time. The recommended way
to organize and grant privileges on multiple tables to a principal is via databases. Granting a
SELECT privilege on a database implicitly grants that principal
SELECT privileges on
all tables and views in that database. For example, if a database D has tables t1 and t2, and an
admin issues the following
GRANT SELECT ON DATABASE D TO `<user>@<domain-name>`
<user>@<domain-name> can select from tables t1 and t2, as well as any tables and views created in database D in the future.
How do I grant a user privileges on all tables except one?
SELECT privilege to the database and then deny
SELECT privilege for the specific table you want to restrict access to.
GRANT SELECT ON DATABASE D TO `<user>@<domain-name>` DENY SELECT ON TABLE D.T TO `<user>@<domain-name>`
<user>@<domain-name> can select from all tables in D except D.T.
A user has
SELECT privileges on a view of table T, but when that user tries to
SELECT from that view, they get the error
User does not have privilege SELECT on table.
This common error can occur for one of the following reasons:
- Table T has no registered owner because it was created using a cluster for which table ACLs are disabled.
- The grantor of the
SELECTprivilege on a view of table T is not the owner of table T or the user does not also have select
SELECTprivilege on table T.
Suppose there is a table T owned by A. A owns view V1 on T and B owns view V2 on T.
- A user can select on V1 when A has granted
SELECTprivileges on view V1.
- A user can select on V2 when A has granted
SELECTprivileges on table T and B has granted
SELECTprivileges on V2.
As described in the Object ownership section, these conditions ensure that only the owner of an object can grant other users access to that object.
I tried to run
sc.parallelize on a cluster with table ACLs enabled and it fails.
On clusters with table ACLs enabled you can use only the Spark SQL and Python DataFrame APIs. The RDD API is disallowed for security reasons, since Databricks does not have the ability to inspect and authorize code within an RDD.