Work with Unity Catalog and the legacy Hive metastore

This article explains one approach to continuing to use the legacy per-workspace Hive metastore when your Databricks workspace is enabled for Unity Catalog.

If your workspace was in service before it was enabled for Unity Catalog, it likely has a built-in Hive metastore that contains data that you might want to continue to use. This article describes how to continue to work with tables that are registered in a Hive metastore.

important

The per-workspace Hive metastore is a legacy feature, and the instructions provided in this article represent legacy workflows.

Tables in the Hive metastore do not benefit from the full set of security and governance features provided by Unity Catalog, such as built-in auditing, lineage, and access control. Databricks recommends that you migrate those tables and the workloads that reference them to Unity Catalog and disable direct access to the Hive metastore.

Two migration paths are available:

Upgrade all tables registered in the Hive metastore to Unity Catalog. See Upgrade Hive tables and views to Unity Catalog and Use the UCX utilities to upgrade your workspace to Unity Catalog.

Federate your Hive metastore to Unity Catalog using Hive Metastore federation for a more gradual approach. Hive metastore federation creates a foreign catalog in Unity Catalog that mirrors the Hive metastore.

Query the Hive metastore in Unity Catalog

The Unity Catalog metastore is additive, meaning it can be used with the per-workspace Hive metastore in Databricks. The Hive metastore appears as a top-level catalog called hive_metastore in the three-level namespace.

For example, you can refer to a table called sales_raw in the sales schema in the legacy Hive metastore by using the following notation:

SQL
Python
R
Scala

SQL
SELECT * from hive_metastore.sales.sales_raw;

Python
display(spark.table("hive_metastore.sales.sales_raw"))

R
library(SparkR)

display(tableToDF("hive_metastore.sales.sales_raw"))

Scala
display(spark.table("hive_metastore.sales.sales_raw"))

You can also specify the catalog and schema with a USE statement:

SQL
Python
R
Scala

SQL
USE hive_metastore.sales;
SELECT * from sales_raw;

Python
spark.sql("USE hive_metastore.sales")
display(spark.table("sales_raw"))

R
library(SparkR)

sql("USE hive_metastore.sales")
display(tableToDF("sales_raw"))

Scala
spark.sql("USE hive_metastore.sales")
display(spark.table("sales_raw"))

Access control in Unity Catalog compared to the legacy Hive metastore

If you configured legacy table access control on the Hive metastore, Databricks continues to enforce those access controls for data in the hive_metastore catalog for clusters running in the standard access mode.

The Unity Catalog access model differs slightly from legacy access controls:

Metastore: The Unity Catalog is an account-level object, and the Hive metastore is a workspace-level object. Permissions defined within the hive_metastore catalog always refer to the local users and groups in the workspace.
Account groups: Access control policies in Unity Catalog are applied to account groups, while access control policies for the Hive metastore are applied to workspace-local groups. See Group sources.
USE CATALOG and USE SCHEMA permissions are required on the catalog and schema for all operations on objects inside the catalog or schema: Regardless of a principal's privileges on a table, the principal must also have the USE CATALOG privilege on its parent catalog to access the schema and the USE SCHEMA privilege to access objects within the schema. With workspace-level table access controls, on the other hand, granting USAGE on the root catalog automatically grants USAGE on all databases, but USAGE on the root catalog is not required.
Views: In Unity Catalog, the owner of a view does not need to be an owner of the view's referenced tables and views. Having the SELECT privilege is sufficient, along with USE SCHEMA on the views' parent schema and USE CATALOG on the parent catalog. With workspace-level table access controls, a view's owner needs to be an owner of all referenced tables and views.
No support for ANY FILE or ANONYMOUS FUNCTION: In Unity Catalog, there is no concept of an ANY FILE or ANONYMOUS FUNCTION securable that might allow an unprivileged user to run privileged code.
No support for DENY: The Unity Catalog privilege model is built on the principle of least privilege. Privileges that are not granted are implicitly denied.
No READ_METADATA privilege: Unity Catalog manages access to view metadata in a different way. See Unity Catalog privileges and securable objects.

Joins between Unity Catalog and Hive metastore objects

By using three-level namespace notation, you can join data in a Unity Catalog metastore with data in the legacy Hive metastore.

note

A join with data in the legacy Hive metastore will only work on the workspace where that data resides. Trying to run such a join in another workspace results in an error. Databricks recommends that you upgrade legacy tables and views to Unity Catalog.

The following example joins results from the sales_current table in the legacy Hive metastore with the sales_historical table in the Unity Catalog metastore when the order_id fields are equal.

SQL
Python
R
Scala

SQL
SELECT * FROM hive_metastore.sales.sales_current
JOIN main.shared_sales.sales_historical
ON hive_metastore.sales.sales_current.order_id = main.shared_sales.sales_historical.order_id;

Python
dfCurrent = spark.table("hive_metastore.sales.sales_current")
dfHistorical = spark.table("main.shared_sales.sales_historical")

display(dfCurrent.join(
  other = dfHistorical,
  on = dfCurrent.order_id == dfHistorical.order_id
))

R
library(SparkR)

dfCurrent = tableToDF("hive_metastore.sales.sales_current")
dfHistorical = tableToDF("main.shared_sales.sales_historical")

display(join(
  x = dfCurrent,
  y = dfHistorical,
  joinExpr = dfCurrent$order_id == dfHistorical$order_id))

Scala
val dfCurrent = spark.table("hive_metastore.sales.sales_current")
val dfHistorical = spark.table("main.shared_sales.sales_historical")

display(dfCurrent.join(
  right = dfHistorical,
  joinExprs = dfCurrent("order_id") === dfHistorical("order_id")
))

Default catalog

A default catalog is configured for each workspace that is enabled for Unity Catalog.

If you omit the top-level catalog name when you perform data operations, the default catalog is assumed.

The default catalog that was initially configured for your workspace depends on how your workspace was enabled for Unity Catalog:

If your workspace was enabled for Unity Catalog automatically, the workspace catalog was set as the default catalog. See Automatic enablement of Unity Catalog.
If your workspace was enabled for Unity Catalog manually, the hive_metastore catalog was set as the default catalog.

If you are transitioning from the Hive metastore to Unity Catalog within an existing workspace, it makes sense to use hive_metastore as the default catalog to avoid impacting existing code that references the hive metastore unless you have fully migrated off Hive metastore.

To learn how to get and switch the default catalog, see Manage the default catalog

Cluster-scoped data access permissions

When you use the Hive metastore alongside Unity Catalog, data access credentials associated with the cluster are used to access Hive metastore data but not data registered in Unity Catalog.

If users access paths that are outside Unity Catalog (such as a path not registered as a table or external location) then the access credentials assigned to the cluster are used.

See Connect to Amazon S3.

Query the Hive metastore in Unity Catalog​

Access control in Unity Catalog compared to the legacy Hive metastore​

Joins between Unity Catalog and Hive metastore objects​

Default catalog​

Cluster-scoped data access permissions​

Query the Hive metastore in Unity Catalog

Access control in Unity Catalog compared to the legacy Hive metastore

Joins between Unity Catalog and Hive metastore objects

Default catalog

Cluster-scoped data access permissions