Create a Unity Catalog metastore
This article shows how to create a Unity Catalog metastore and link it to workspaces.
Important
For workspaces that were enabled for Unity Catalog automatically, the instructions in this article are unnecessary. Databricks began to enable new workspaces for Unity Catalog automatically on November 8, 2023, with a rollout proceeding gradually across accounts. You must follow the instructions in this article only if you have a workspace and don’t already have a metastore in your workspace region. To determine whether a metastore already exists in your region, see Automatic enablement of Unity Catalog.
A metastore is the top-level container for data in Unity Catalog. Unity Catalog metastores register metadata about securable objects (such as tables, volumes, external locations, and shares) and the permissions that govern access to them. Each metastore exposes a three-level namespace (catalog
.schema
.table
) by which data can be organized. You must have one metastore for each region in which your organization operates. To work with Unity Catalog, users must be on a workspace that is attached to a metastore in their region.
To create a metastore, you do the following:
In your AWS account, optionally create a storage location for metastore-level storage of managed tables and volumes.
For information to help you decide whether you need metastore-level storage, see (Optional) Create metastore-level storage and Data is physically separated in storage.
In your AWS account, create an IAM role that gives access to that storage location.
In Databricks, create the metastore, attaching the storage location, and assign workspaces to the metastore.
Note
In addition to the approaches described in this article, you can also create a metastore by using the Databricks Terraform provider, specifically the databricks_metastore resource. To enable Unity Catalog to access the metastore, use databricks_metastore_data_access. To link workspaces to a metastore, use databricks_metastore_assignment.
Before you begin
Before you begin, you should familiarize yourself with the basic Unity Catalog concepts, including metastores and managed storage. See What is Unity Catalog?.
You should also confirm that you meet the following requirements for all setup steps:
You must be a Databricks account admin.
Your Databricks account must be on the Premium plan or above.
If you want to set up metastore-level root storage, you must have the ability to create S3 buckets, IAM roles, IAM policies, and cross-account trust relationships in your AWS account.
Step 1 (Optional): Create an S3 bucket for metastore-level managed storage in AWS
In this step, which is optional, you create the S3 bucket required by Unity Catalog to store managed table and volume data at the metastore level. You create the S3 bucket in your own AWS account. To determine whether you need metastore-level storage, see (Optional) Create metastore-level storage.
In AWS, create an S3 bucket.
This S3 bucket will be the metastore-level storage location for managed tables and managed volumes in Unity Catalog. This storage location can be overridden at the catalog and schema levels. See Specify a managed storage location in Unity Catalog
Requirements:
If you have more than one metastore, you should use a dedicated S3 bucket for each one.
Locate the bucket in the same region as the workspaces you want to access the data from.
The bucket name cannot include dot notation (for example,
incorrect.bucket.name.notation
). For more bucket naming guidance, see the AWS bucket naming rules.
Make a note of the S3 bucket path, which starts with
s3://
.If you enable KMS encryption on the S3 bucket, make a note of the name of the KMS encryption key.
Step 2 (Optional): Create an IAM role to access the storage location
In this step, which is required only if you completed step 1, you create the IAM role required by Unity Catalog to access the S3 bucket that you created in the previous step.
Role creation is a two-step process. First you simply create the role, adding a temporary trust relationship policy that you then modify in a later step. You must modify the trust policy after you create the role because your role must be self-assuming—that is, it must be configured to trust itself. The role must therefore exist before you add the self-assumption statement. For information about self-assuming roles, see this Amazon blog article.
Find your Databricks account ID.
Log in to the Databricks account console.
Click your username.
From the menu, copy the Account ID value.
In AWS, create an IAM role with a Custom Trust Policy.
In the Custom Trust Policy field, paste the following policy JSON, replacing
<DATABRICKS-ACCOUNT-ID>
with the Databricks account ID you found in step 1 (not your AWS account ID).This policy establishes a cross-account trust relationship so that Unity Catalog can assume the role to access the data in the bucket on behalf of Databricks users. This is specified by the ARN in the
Principal
section. It is a static value that references a role created by Databricks. Do not modify it.{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL" ] }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "<DATABRICKS-ACCOUNT-ID>" } } }] }
If you are are using AWS GovCloud use the policy below:
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "AWS": [ "arn:aws-us-gov:iam::044793339203:role/unity-catalog-prod-UCMasterRole-1QRFA8SGY15OJ" ] }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "<DATABRICKS-ACCOUNT-ID>" } } }] }
Skip the permissions policy configuration. You’ll go back to add that in a later step.
Save the IAM role.
Modify the trust relationship policy to make it “self-assuming.”
Return to your saved IAM role and go to the Trust Relationships tab.
Edit the trust relationship policy, adding the following ARN to the “Allow” statement.
Replace
<YOUR-AWS-ACCOUNT-ID>
and<THIS-ROLE-NAME>
with your actual IAM role values."arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/<THIS-ROLE-NAME>"
Your policy should now look like this (with replacement text updated to use your Databricks account ID and IAM role values):
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL", "arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/<THIS-ROLE-NAME>" ] }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "<DATABRICKS-ACCOUNT-ID>" } } } ] }
In AWS, create an IAM policy in the same AWS account as the S3 bucket.
To avoid unexpected issues, you must use the following sample policy, replacing the following values:
<BUCKET>
: The name of the S3 bucket you created in the previous step.<KMS-KEY>
: Optional. If encryption is enabled, provide the name of the KMS key that encrypts the S3 bucket contents. If encryption is disabled, remove the entire KMS section of the IAM policy.<AWS-ACCOUNT-ID>
: The Account ID of the current AWS account (not your Databricks account).<AWS-IAM-ROLE-NAME>
: The name of the AWS IAM role that you created in the previous step.
{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::<BUCKET>/*", "arn:aws:s3:::<BUCKET>" ], "Effect": "Allow" }, { "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*" ], "Resource": [ "arn:aws:kms:<KMS-KEY>" ], "Effect": "Allow" }, { "Action": [ "sts:AssumeRole" ], "Resource": [ "arn:aws:iam::<AWS-ACCOUNT-ID>:role/<AWS-IAM-ROLE-NAME>" ], "Effect": "Allow" } ] }
Note
If you need a more restrictive IAM policy for Unity Catalog, contact your Databricks representative for assistance.
Attach the IAM policy to the IAM role.
On the IAM role’s Permissions tab, attach the IAM policy that you just created.
Step 3: Create the metastore and attach a workspace
Each Databricks region requires its own Unity Catalog metastore.
You create a metastore for each region in which your organization operates. You can link each of these regional metastores to any number of workspaces in that region. Each linked workspace has the same view of the data in the metastore, and data access control can be managed across workspaces. You can access data in other metastores using Delta Sharing.
If you chose to create metastore-level storage, the metastore will use the the S3 bucket and IAM role that you created in the previous steps.
To create a metastore:
Log in to the Databricks account console.
Click Catalog.
Click Create metastore.
Enter the following:
A name for the metastore.
The region where you want to deploy the metastore.
This must be in the same region as the workspaces you want to use to access the data. Make sure that this matches the region of the storage bucket you created earlier.
(Optional) The S3 bucket path (you can omit
s3://
) and IAM role name for the bucket and role you created in Step 1 (Optional): Create an S3 bucket for metastore-level managed storage in AWS.
Click Create.
When prompted, select workspaces to link to the metastore.
For details, see Enable a workspace for Unity Catalog.
Transfer the metastore admin role to a group.
The user who creates a metastore is its owner, also called the metastore admin. The metastore admin can create top-level objects in the metastore such as catalogs and can manage access to tables and other objects. Databricks recommends that you reassign the metastore admin role to a group. See Assign a metastore admin.
Enable Databricks management of uploads to managed volumes.
Databricks uses cross-origin resource sharing (CORS) to upload data to managed volumes in Unity Catalog. See Configure Unity Catalog storage account for CORS.