Work with Unity Catalog and the legacy Hive metastore

This article explains how to use a Unity Catalog metastore with the per-workspace Hive metastore in Databricks.

Query the Hive metastore in Unity Catalog

The Unity Catalog metastore is additive, meaning it can be used with the per-workspace Hive metastore in Databricks. The Hive metastore appears as a top-level catalog called hive_metastore in the three-level namespace.

For example, you can refer to a table called sales_raw in the sales schema in the legacy Hive metastore by using the following notation:

SELECT * from hive_metastore.sales.sales_raw;
display(spark.table("hive_metastore.sales.sales_raw"))
library(SparkR)

display(tableToDF("hive_metastore.sales.sales_raw"))
display(spark.table("hive_metastore.sales.sales_raw"))

You can also specify the catalog and schema with a USE statement:

USE hive_metastore.sales;
SELECT * from sales_raw;
spark.sql("USE hive_metastore.sales")
display(spark.table("sales_raw"))
library(SparkR)

sql("USE hive_metastore.sales")
display(tableToDF("sales_raw"))
spark.sql("USE hive_metastore.sales")
display(spark.table("sales_raw"))

Access control in Unity Catalog and the Hive metastore

If you configured table access control on the Hive metastore, Databricks will continue enforcing those access controls for data in the hive_metastore catalog for clusters running in the shared access mode. The Unity Catalog access model differs slightly from legacy access controls, like no DENY statements. The Hive metastore is a workspace-level object. Permissions defined within the hive_metastore catalog always refer to the local users and groups in the workspace. See Differences from table access control.

Differences from table access control

Unity Catalog has the following key differences from using table access controls in the legacy Hive metastore in each workspace.

The access control model in Unity Catalog has the following differences from table access control:

  • Account-level identities: Access control policies in Unity Catalog are applied to account-level users and groups, while access control policies for the Hive metastore are applied to workspace-level users and groups.

  • No inheritance of privileges: In Unity Catalog, privileges on a parent object are not inherited by its child objects.

  • USAGE permission is required on the catalog for all operations on objects inside the catalog: Regardless of a principal’s privileges on a table or database, the principal must also have the USAGE privilege on its parent catalog to access the table or database. With workspace-level table access controls, granting USAGE on the root catalog automatically grants USAGE on all databases, but USAGE on the root catalog is not required.

  • Views: In Unity Catalog, the owner of a view does not need to be an owner of the view’s referenced tables and views. Having the SELECT privilege is sufficient, along with USAGE on the views’ parent schema and catalog. With workspace-level table access controls, a view’s owner needs to be an owner of all referenced tables and views.

  • No support for ALL FILES or ANONYMOUS FUNCTIONs: In Unity Catalog, there is no concept of an ALL FILES or ANONYMOUS FUNCTION permission. These permissions could be used to circumvent access control restrictions by allowing an unprivileged user to run privileged code.

Joins between Unity Catalog and Hive metastore objects

By using three-level namespace notation, you can join data in a Unity Catalog metastore with data in the legacy Hive metastore.

Note

A join with data in the legacy Hive metastore will only work on the workspace where that data resides. Trying to run such a join in another workspace results in an error. Databricks recommends that you upgrade legacy tables and views to Unity Catalog.

The following example joins results from the sales_current table in the legacy Hive metastore with the sales_historical table in the Unity Catalog metastore when the order_id fields are equal.

SELECT * FROM hive_metastore.sales.sales_current
JOIN main.shared_sales.sales_historical
ON hive_metastore.sales.sales_current.order_id = main.shared_sales.sales_historical.order_id;
dfCurrent = spark.table("hive_metastore.sales.sales_current")
dfHistorical = spark.table("main.shared_sales.sales_historical")

display(dfCurrent.join(
  other = dfHistorical,
  on = dfCurrent.order_id == dfHistorical.order_id
))
library(SparkR)

dfCurrent = tableToDF("hive_metastore.sales.sales_current")
dfHistorical = tableToDF("main.shared_sales.sales_historical")

display(join(
  x = dfCurrent,
  y = dfHistorical,
  joinExpr = dfCurrent$order_id == dfHistorical$order_id))
val dfCurrent = spark.table("hive_metastore.sales.sales_current")
val dfHistorical = spark.table("main.shared_sales.sales_historical")

display(dfCurrent.join(
  right = dfHistorical,
  joinExprs = dfCurrent("order_id") === dfHistorical("order_id")
))

Default catalog

If you omit the top-level catalog name and there is no USE CATALOG statement, the default catalog is assumed. To configure the default catalog for a workspace, set the spark.databricks.sql.initial.catalog.name value.

Databricks recommends setting the default catalog value to hive_metastore so that your existing code can operate on current Hive metastore data without any change.

Cluster instance profile

When using the Hive metastore alongside Unity Catalog, the instance profile on the cluster is used to access Hive metastore data but not data registered in Unity Catalog.

If users access paths that are outside Unity Catalog (such as a path not registered as a table or external location) then the instance profile on the cluster is used.

Upgrade legacy tables to Unity Catalog

Tables in the Hive metastore do not benefit from the full set of security and governance features that Unity Catalog introduces, such as built-in auditing and access control. Databricks recommends that you upgrade your legacy tables by adding them to Unity Catalog.