Enable Hive metastore federation for an external Hive metastore

Preview

This article shows how to federate an external Hive metastore so that your organization can work with your Hive metastore tables using Unity Catalog.

For an overview of Hive metastore federation, see Hive metastore federation: enable Unity Catalog to govern tables registered in a Hive metastore.

Before you begin

Review the list of services and features supported by Hive metastore federation: Requirements, supported features, and limitations.

Specific requirements are listed for each step below.

Step 1: Connect Unity Catalog to your external Hive metastore

In this step, you create a connection, a Unity Catalog securable object that specifies a path and credentials for accessing a database system, in this case your Hive metastore.

Requirements

You must have the following:

A username and password that grants access to the database system that hosts the Hive metastore.
The url to the database (host and port).
The database name.
The CREATE CONNECTION privilege on the Unity Catalog metastore. Metastore admins have this privilege by default.

Create the connection

To create a connection, you can use Catalog Explorer or the CREATE CONNECTION SQL command in a Databricks notebook or the Databricks SQL query editor.

Note

You can also use the Databricks REST API or the Databricks CLI to create a connection. See POST /api/2.1/unity-catalog/connections and Unity Catalog commands.

In your Databricks workspace, click Catalog.
On the Quick access page, click Add data > Add a connection.
Enter a user-friendly Connection name.
Select a Connection type of Hive Metastore and Metastore type of External.
Enter the following Authentication properties for your host database.
- Host: For example, mysql-demo.lb123.us-west-2.rds.amazonaws.com
- Port: For example, 3306
- User: For example, mysql_user
- Password: For example, password123
Enter the following Connection details for your host database.
- Version: Supported Hive metastore versions include 0.13 and 2.3.
- Database: The name of the database you are connecting to.
- Database type: For example, MySQL.
(Optional) Add a comment.

You need this only if you don’t want to use the default location.
Click Create.

Run the following command in a notebook or the Databricks SQL query editor.

CREATE CONNECTION <connection-name> TYPE hive_metastore
OPTIONS (
  host '<hostname>',
  port '<port>',
  user '<user>',
  password '<password>',
  database '<database-name>',
  db_type 'MYSQL',
  version '2.3',
);

Include warehouse_directory only if you don’t want to use the default Hive warehouse directory location for your Hive metastore.

We recommend that you use Databricks secrets instead of plaintext strings for sensitive values like credentials. For example:

CREATE CONNECTION <connection-name> TYPE hive_metastore
OPTIONS (
  host '<hostname>',
  port '<port>',
  user secret ('<secret-scope>','<secret-key-user>'),
  password secret ('<secret-scope>','<secret-key-password>'),
  database '<database-name>',
  db_type 'MYSQL',
  version '2.3'
)

If you must use plaintext strings in notebook SQL commands, avoid truncating the string by escaping special characters like $ with \. For example: \$.

For information about setting up secrets, see Secret management.

Step 2: Create external locations for data in your Hive metastore

In this step, you configure an external location in Unity Catalog to govern access to the cloud storage locations that hold the data registered in your external Hive metastore.

External locations are Unity Catalog securable objects that associate storage credentials with cloud storage container paths. See External locations and storage credentials.

Requirements

For cloud storage and Databricks permission requirements, see “Before you begin” in Create an external location to connect cloud storage to Databricks.

Options for creating the external location

The process that Databricks recommends for creating an external location in Unity Catalog depends on your situation:

If you do not have a storage credential created in Unity Catalog, you can create the external location using an AWS CloudFormation template that creates an IAM role and storage credential for you. Follow the instructions in Create an external location for an S3 bucket using an AWS CloudFormation template.
If you already have a storage credential created in Unity Catalog, create the external location manually. Follow the instructions in Create an external location manually using Catalog Explorer or Create an external location using SQL.
If you are federating a Hive metastore that stores data in DBFS mounts, regardless of whether an applicable storage credential exists, create the external location manually and use the Copy from DBFS mount option. See Create an external location manually using Catalog Explorer.

Enable fallback mode on external locations

As soon as you create an external location in Unity Catalog, access to the path represented by that external location is enforced by Unity Catalog permissions when you run queries on Unity Catalog-enabled compute (single user, shared, or SQL warehouse). This can interrupt existing workloads that don’t have the correct Unity Catalog permissions to access the path.

When an external location is in fallback mode, the system first checks the querying principal’s Unity Catalog permissions on the location, and if that doesn’t succeed, falls back to using existing cluster- or notebook-scoped credentials, such as instance profiles or Apache Spark configuration properties, so that your existing workloads continue to run uninterrupted.

Fallback mode is convenient when you are in the process of migrating your legacy workload. Once you’ve updated your workloads to run successfully using Unity Catalog permissions, you should disable fallback mode to prevent legacy cluster-scoped credentials from being used to bypass Unity Catalog data governance.

You can enable fallback mode using Catalog Explorer or the Unity Catalog external locations REST API.

Permissions required: Owner of the external location.

In your Databricks workspace, click Catalog.
On the Quick access page, click External data >.
Select the external location you want to update.
Turn on the Fallback mode toggle and click Enable to confirm.

The following curl examples show how to enable fallback mode when you create an external location and when you update an existing external location.

Creating a new external location:

curl -X POST -H 'Authorization: Bearer <token>' \
https://<workspace-URL>/api/2.1/unity-catalog/external-locations \
--data
'{
  "name": "fallback_mode_enabled_external_location",
  "url": "s3://external_location_bucket/url",
  "credential_name": "external_location_credential",
  "fallback": true
  "skip_validation": true
}'

Updating an external location:

curl -X PATCH \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
https://<workspace-URL>/api/2.1/unity-catalog/external-locations/<external-location-name> \
--data
 '{
   "comment": "fallback mode enabled",
   "fallback": true
  }'

Step 3: Create a federated catalog

In this step, you use the connection that you created in step 1 to create a federated catalog in Unity Catalog that points to the external location you created in step 2. A federated catalog is a type of foreign catalog, which is a securable object in Unity Catalog that mirrors a database or catalog in an external data system, enabling you to perform queries on that data in your Databricks workspace, with access managed by Unity Catalog. In this case, the mirrored catalog is your data registered in a Hive metastore.

Any time a user or workflow interacts with the federated catalog, metadata is synced from the Hive metastore.

Requirements

Permission requirements:

To create the foreign catalog:

The CREATE CATALOG privilege on your Unity Catalog metastore.
Either ownership of the connection or the CREATE FOREIGN CATALOG privilege on the connection.
To enter authorized paths for the foreign catalog, you must have the CREATE FOREIGN SECURABLE privilege on an external location that covers those paths. The owner of the external location has this privilege by default.

To work with the foreign catalog:

Ownership of the catalog or USE CATALOG

Compute requirements:

To create the catalog using Catalog Explorer: no compute required.
To create the catalog using SQL: Databricks Runtime 13.3 LTS or above.
To work with the catalog: a shared cluster on Databricks Runtime 13.3 LTS, 14.3 LTS, 15.1 or above.

Create the foreign catalog

To create a foreign catalog, you can use Catalog Explorer or the CREATE FOREIGN CATALOG SQL command in a Databricks notebook or the SQL query editor.

Note

You can also use the Unity Catalog API. See Create a catalog in the Databricks REST API reference.