Requirements
In Databricks, find your account ID.
a. Log in to the Databricks account console.
b. Click User Profile.
c. From the pop-up, copy the value next to Account ID.
In AWS, create an IAM policy that allows reading and writing to the S3 bucket path. In the following policy, replace the following values:
<BUCKET>
: The name of the S3 bucket from the previous step.<KMS_KEY>
: The name of the KMS key that encrypts the S3 bucket contents, if encryption is enabled. If encryption is disabled, remove the last portion of the IAM policy.
{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::<BUCKET>/*", "arn:aws:s3:::<BUCKET>" ], "Effect": "Allow" }, { "Action": [ "s3:GetObject", "s3:GetObjectVersion", "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::databricks-datasets-oregon/*", "arn:aws:s3:::databricks-datasets-oregon" ], "Effect": "Allow" }, { "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*" ], "Resource": [ "arn:aws:kms:<KMS_KEY" ], "Effect": "Allow" } ] }
Create an IAM role that uses the IAM policy you created in the previous step.
a. Set EC2 as the trusted entity.
b. In the Role’s Permission tab, attach the IAM Policy you just created.
c. Set up a cross-account trust relationship so that Unity Catalog can assume the role to access the data in the bucket on the behalf of Databricks users by pasting the following policy JSON into the Trust Relationship tab.
Do not modify the role ARN in the
Principal
section, which is a static value that references a role created by Databricks.In the
sts:ExternalId
section, replace<DATABRICKS_ACCOUNT_ID>
with your Databricks account ID from the first step (not your AWS account ID).```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "<DATABRICKS_ACCOUNT_ID>" } } } ] } ```
Create a storage credential
You can create a storage credential using Catalog Explorer or the Unity Catalog CLI. Follow these steps to create a storage credential using Catalog Explorer.
- In a new browser tab, log in to Databricks.
- Click Catalog.
- Click Storage Credentials.
- Click Create Credential.
- Enter example_credential for the name of the storage credential.
- Set IAM role (ARN) to the IAM role you just created.
- Optionally enter a comment for the storage credential.
- Click Save.
Leave this browser open for the next steps.
Create an external location
An external location references a storage credential and also contains a storage path on your cloud tenant. The external location allows reading from and writing to only that path and its child directories. You can create an external location using Catalog Explorer, a SQL command, or the Unity Catalog CLI. Follow these steps to create an external location using Catalog Explorer.
- Go to the browser tab where you just created a storage credential.
- Click Catalog.
- Click External Locations.
- Click Create location.
- Enter example_location for the name of the external location.
- Enter the S3 bucket path the location allows reading from or writing to.
- Set Storage Credential to example_credential, the storage credential you just created.
- Optionally enter a comment for the external location.
- Click Save.
Create an external table in Unity Catalog
This notebook shows how to create an external table in Unity Catalog from a Delta table in the
samples
catalog. The data for an external table is stored in a path in your cloud tenant's storage, outside the default storage location for a metastore. When you drop an external table, its underlying data is not deleted.Before you can create an external table, you must create a storage credential that allows Unity Catalog to read from and write to the path on your cloud tenant, and an external location that references it.