Hive metastore privileges and securable objects (legacy)

This article describes the privilege model for the legacy Databricks Hive metastore, which is built in to each Databricks workspace. It also describes how to grant, deny, and revoke privileges for objects in the built-in Hive metastore. Unity Catalog uses a different model for granting privileges. See Unity Catalog privileges and securable objects.

Note

Table access control for data managed by the Hive metastore is a legacy data governance model. Databricks recommends that you upgrade the tables managed by the Hive metastore to the Unity Catalog metastore. Unity Catalog simplifies security and governance of your data by providing a central place to administer and audit data access across multiple workspaces in your account. To learn more about how the legacy privilege model differs from the Unity Catalog privilege model, see Work with Unity Catalog and the legacy Hive metastore.

Requirements

Note

  • Data access control is always enabled in Databricks SQL even if table access control is not enabled for the workspace.

  • If table access control is enabled for the workspace and you have already specified ACLs (granted and denied privileges) in the workspace, those ACLs are respected in Databricks SQL.

Manage privileges on objects in the Hive metastore

Privileges on data objects managed by the Hive metastore can be granted by either a workspace admin or the owner of an object. You can manage privileges for Hive metastore objects by using SQL commands.

To manage privileges in SQL, you use GRANT, REVOKE, DENY, MSCK, and SHOW GRANTS statements in a notebook or the Databricks SQL query editor, using the syntax:

GRANT privilege_type ON securable_object TO principal

Where:

To grant a privilege to all users in your workspace, grant the privilege to the users group. For example:

GRANT SELECT ON TABLE <schema-name>.<table-name> TO users

For more information about managing privileges for objects in the Hive metastore using SQL commands, see Privileges and securable objects in the Hive metastore.

You can also manage table access control in a fully automated setup using the Databricks Terraform provider and databricks_sql_permissions.

Object ownership

When table access control is enabled on a cluster or SQL warehouse, a user who creates a schema, table, view, or function becomes its owner. The owner is granted all privileges and can grant privileges to other users.

Groups may own objects, in which case all members of that group are considered owners.

Either the owner of an object or a workspace admin can transfer ownership of an object using the following command:

ALTER <object> OWNER TO `<user-name>@<user-domain>.com`

Note

When table access control is disabled on a cluster or SQL warehouse, owners are not registered when a schema, table, or view is created. A workspace admin must assign an owner to the object using the ALTER <object> OWNER TO command.

Securable objects in the Hive metastore

The securable objects are:

  • CATALOG: controls access to the entire data catalog.

    • SCHEMA: controls access to a schema.

      • TABLE: controls access to a managed or external table.

      • VIEW: controls access to SQL views.

      • FUNCTION: controls access to a named function.

  • ANONYMOUS FUNCTION: controls access to anonymous or temporary functions.

    Note

    ANONYMOUS FUNCTION objects are not supported in Databricks SQL.

  • ANY FILE: controls access to the underlying filesystem.

    Warning

    Users granted access to ANY FILE can bypass the restrictions put on the catalog, schemas, tables, and views by reading from the filesystem directly.

Note

Privileges on global and local temporary views are not supported. Local temporary views are visible only within the same session, and views created in the global_temp schema are visible to all users sharing a cluster or SQL warehouse. However, privileges on the underlying tables and views referenced by any temporary views are enforced.

Privileges you can grant on Hive metastore objects

  • SELECT: gives read access to an object.

  • CREATE: gives ability to create an object (for example, a table in a schema).

  • MODIFY: gives ability to add, delete, and modify data to or from an object.

  • USAGE: does not give any abilities, but is an additional requirement to perform any action on a schema object.

  • READ_METADATA: gives ability to view an object and its metadata.

  • CREATE_NAMED_FUNCTION: gives ability to create a named UDF in an existing catalog or schema.

  • MODIFY_CLASSPATH: gives ability to add files to the Spark class path.

  • ALL PRIVILEGES: gives all privileges (is translated into all the above privileges).

Note

The MODIFY_CLASSPATH privilege is not supported in Databricks SQL.

USAGE privilege

To perform an action on a schema object in the Hive metastore, a user must have the USAGE privilege on that schema in addition to the privilege to perform that action. Any one of the following satisfies the USAGE requirement:

  • Be a workspace admin

  • Have the USAGE privilege on the schema or be in a group that has the USAGE privilege on the schema

  • Have the USAGE privilege on the CATALOG or be in a group that has the USAGE privilege

  • Be the owner of the schema or be in a group that owns the schema

Even the owner of an object inside a schema must have the USAGE privilege in order to use it.

Privilege hierarchy

When table access control is enabled on the workspace and on all clusters, SQL objects in Databricks are hierarchical and privileges are inherited downward. This means that granting or denying a privilege on the CATALOG automatically grants or denies the privilege to all schemas in the catalog. Similarly, privileges granted on a schema object are inherited by all objects in that schema. This pattern is true for all securable objects.

If you deny a user privileges on a table, the user can’t see the table by attempting to list all tables in the schema. If you deny a user privileges on a schema, the user can’t see that the schema exists by attempting to list all schemas in the catalog.

Dynamic view functions

Databricks includes two user functions that allow you to express column- and row-level permissions dynamically in the body of a view definition that is managed by the Hive metastore.

  • current_user(): return the current user name.

  • is_member(): determine if the current user is a member of a specific Databricks group at the workspace level.

Note

To use these functions in Databricks Runtime 7.3 LTS, you must set the Spark config spark.databricks.userInfoFunctions.enabled true. They are enabled by default in all supported Databricks Runtime versions above 7.3.

The following example combines both functions to determine if a user has the appropriate group membership:

-- Return: true if the user is a member and false if they are not
SELECT
  current_user as user,
-- Check to see if the current user is a member of the "Managers" group.
  is_member("Managers") as admin

Column-level permissions

You can use dynamic views to limit the columns a specific group or user can see. Consider the following example where only users who belong to the auditors group are able to see email addresses from the sales_raw table. At analysis time Spark replaces the CASE statement with either the literal 'REDACTED' or the column email. This behavior allows for all the usual performance optimizations provided by Spark.

-- Alias the field 'email' to itself (as 'email') to prevent the
-- permission logic from showing up directly in the column name results.
CREATE VIEW sales_redacted AS
SELECT
  user_id,
  CASE WHEN
    is_group_member('auditors') THEN email
    ELSE 'REDACTED'
  END AS email,
  country,
  product,
  total
FROM sales_raw

Row-level permissions

Using dynamic views you can specify permissions down to the row or field level. Consider the following example, where only users who belong to the managers group are able to see transaction amounts (total column) greater than $1,000,000.00:

CREATE VIEW sales_redacted AS
SELECT
  user_id,
  country,
  product,
  total
FROM sales_raw
WHERE
  CASE
    WHEN is_group_member('managers') THEN TRUE
    ELSE total <= 1000000
  END;

Data masking

As shown in the preceding examples, you can implement column-level masking to prevent users from seeing specific column data unless they are in the correct group. Because these views are standard Spark SQL, you can do more advanced types of masking with more complex SQL expressions. The following example lets all users perform analysis on email domains, but lets members of the auditors group see users’ full email addresses.

-- The regexp_extract function takes an email address such as
-- user.x.lastname@example.com and extracts 'example', allowing
-- analysts to query the domain name

CREATE VIEW sales_redacted AS
SELECT
  user_id,
  region,
  CASE
    WHEN is_group_member('auditors') THEN email
    ELSE regexp_extract(email, '^.*@(.*)$', 1)
  END
  FROM sales_raw