Manage external locations and storage credentials
This article introduces external locations and storage credentials and explains how to create and use them to manage access to your data.
What are external locations and storage credentials?
External locations and storage credentials allow Unity Catalog to read and write data on your cloud tenant on behalf of users. These objects are used for:
Creating, reading from, and writing to external tables.
Overriding the metastore’s default managed table storage location at the catalog or schema level.
Creating a managed or external table from files stored on your cloud tenant.
Inserting records into tables from files stored on your cloud tenant.
Directly exploring data files stored on your cloud tenant.
A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant, using an IAM role. Each storage credential is subject to Unity Catalog access-control policies that control which users and groups can access the credential. If a user does not have access to a storage credential in Unity Catalog, the request fails and Unity Catalog does not attempt to authenticate to your cloud tenant on the user’s behalf.
An external location is an object that combines a cloud storage path with a storage credential that authorizes access to the cloud storage path. Each storage location is subject to Unity Catalog access-control policies that control which users and groups can access the credential. If a user does not have access to a storage location in Unity Catalog, the request fails and Unity Catalog does not attempt to authenticate to your cloud tenant on the user’s behalf.
Note
Despite the term “external” in the name, external locations can be used not just to define storage locations for external tables, but also for managed tables. Specifically, they can be used to define storage locations for managed tables at the catalog and schema levels, overriding the metastore root storage location. See CREATE CATALOG and CREATE SCHEMA.
Databricks recommends using external locations rather than using storage credentials directly.
Requirements
To create storage credentials, you must be a Databricks account admin. The account admin who creates the storage credential can delegate ownership to another user or group to manage permissions on it.
To create external locations, you must be a metastore admin or a user with the
CREATE EXTERNAL LOCATION
privilege.The name of the S3 bucket that you want users to read from and write to cannot use dot notation (for example,
incorrect.bucket.name.notation
). For more bucket naming guidance, see the AWS bucket naming rules.
Manage storage credentials
The following sections show how to create and manage storage credentials.
Create a storage credential
To create a storage credential, you need an IAM role that authorizes reading from and writing to an S3 bucket path. You reference that IAM role when you create the storage credential.
Step 1: Create or update an IAM role
In AWS, create or update an IAM role that gives access to the S3 bucket that you want your users to access. This IAM role must be defined in the same account as the S3 bucket.
Tip
If you have already created an IAM role to provide this access, you can skip this step and go straight to Step 2: Give Databricks the IAM role details.
Create an IAM role or update an existing role.
Set up a cross-account trust relationship so that Unity Catalog can assume the role to access the data in the bucket on behalf of Databricks users. Your role must also be configured to be self-assuming, that is, to trust itself. Paste the following policy JSON into the Trust Relationship tab.
Do not modify the first role ARN in the
Principal
section. This is a static value that references a role created by Databricks.The second role ARN is a self-reference to the role you are creating, because the role must be self-assuming. For information about self-assuming roles, see this Amazon blog article. Replace
<YOUR_AWS_ACCOUNT_ID>
and<THIS_ROLE_NAME>
with your actual IAM role values.In the
sts:ExternalId
section, replace<DATABRICKS_ACCOUNT_ID>
with your Databricks account ID (not your AWS account ID). To get the Databricks account ID, see step 1 of Configure a storage bucket and IAM role in AWS.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL", "arn:aws:iam::<YOUR_AWS_ACCOUNT_ID>:role/<THIS_ROLE_NAME>" ] }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "<DATABRICKS_ACCOUNT_ID>" } } } ] }
Create the following IAM policy in the same account as the S3 bucket, replacing the following values:
<BUCKET>
: The name of the S3 bucket.<KMS_KEY>
: Optional. If encryption is enabled, provide the name of the KMS key that encrypts the S3 bucket contents. If encryption is disabled, remove the entire KMS section of the IAM policy.<AWS_ACCOUNT_ID>
: The Account ID of your AWS account (not your Databricks account).<AWS_IAM_ROLE_NAME>
: The name of the AWS IAM role that you created in the previous step.
{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:GetBucketLocation", "s3:GetLifecycleConfiguration", "s3:PutLifecycleConfiguration" ], "Resource": [ "arn:aws:s3:::<BUCKET>/*", "arn:aws:s3:::<BUCKET>" ], "Effect": "Allow" }, { "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*" ], "Resource": [ "arn:aws:kms:<KMS_KEY>" ], "Effect": "Allow" }, { "Action": [ "sts:AssumeRole" ], "Resource": [ "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<AWS_IAM_ROLE_NAME>" ], "Effect": "Allow" } ] }
Note
If you need a more restrictive IAM policy for Unity Catalog, contact your Databricks representative for assistance.
Databricks uses
GetLifecycleConfiguration
andPutLifecycleConfiguration
to manage lifecycle policies for the personal staging locations used by Partner Connect and the upload data UI.
Attach the IAM policy to the IAM role.
In the Role’s Permission tab, attach the IAM Policy you just created.
Step 2: Give Databricks the IAM role details
In Databricks, log in to a workspace that is linked to the metastore.
Click
Data.
At the bottom of the screen, click Storage Credentials.
Click +Add > Add a storage credential.
Enter a name for the credential, the IAM Role ARN that authorizes Unity Catalog to access the storage location on your cloud tenant, and an optional comment.
Tip
If you have already defined an instance profile in Databricks, you can click Copy instance profile to copy over the IAM role ARN for that instance profile. The instance profile’s IAM role must have a cross-account trust relationship that enables Databricks to assume the role in order to access the bucket on behalf of Databricks users. For more information about the IAM role policy and trust relationship requirements, see Step 1: Create or update an IAM role.
Click Save.
Create an external location that references this storage credential.
You can also create a storage credential by using Databricks Terraform provider and databricks_storage_credential.
List storage credentials
To view the list of all storage credentials in a metastore, you can use Data Explorer or a SQL command.
Log in to a workspace that is linked to the metastore.
Click
Data.
At the bottom of the screen, click Storage Credentials.
Run the following command in a notebook or the Databricks SQL editor.
SHOW STORAGE CREDENTIALS;
Run the following command in a notebook.
display(spark.sql("SHOW STORAGE CREDENTIALS"))
Run the following command in a notebook.
library(SparkR)
display(sql("SHOW STORAGE CREDENTIALS"))
Run the following command in a notebook.
display(spark.sql("SHOW STORAGE CREDENTIALS"))
View a storage credential
To view the properties of a storage credential, you can use Data Explorer or a SQL command.
Log in to a workspace that is linked to the metastore.
Click
Data.
At the bottom of the screen, click Storage Credentials.
Click the name of a storage credential to see its properties.
Run the following command in a notebook or the Databricks SQL editor. Replace <credential_name>
with the name of the credential.
DESCRIBE STORAGE CREDENTIAL <credential_name>;
Run the following command in a notebook. Replace <credential_name>
with the name of the credential.
display(spark.sql("DESCRIBE STORAGE CREDENTIAL <credential_name>"))
Run the following command in a notebook. Replace <credential_name>
with the name of the credential.
library(SparkR)
display(sql("DESCRIBE STORAGE CREDENTIAL <credential_name>"))
Run the following command in a notebook. Replace <credential_name>
with the name of the credential.
display(spark.sql("DESCRIBE STORAGE CREDENTIAL <credential_name>"))
Rename a storage credential
To rename a storage credential, you can use Data Explorer or a SQL command.
Log in to a workspace that is linked to the metastore.
Click
Data.
At the bottom of the screen, click Storage Credentials.
Click the name of a storage credential to open the edit dialog.
Rename the storage credential and save it.
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<credential_name>
: The name of the credential.<new_credential_name>
: A new name for the credential.
ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>;
Run the following command in a notebook. Replace the placeholder values:
<credential_name>
: The name of the credential.<new_credential_name>
: A new name for the credential.
spark.sql("ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>")
Run the following command in a notebook. Replace the placeholder values:
<credential_name>
: The name of the credential.<new_credential_name>
: A new name for the credential.
library(SparkR)
sql("ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>")
Run the following command in a notebook. Replace the placeholder values:
<credential_name>
: The name of the credential.<new_credential_name>
: A new name for the credential.
spark.sql("ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>")
Manage permissions for a storage credential
You can grant permissions directly on the storage credential, but Databricks recommends that you reference it in an external location and grant permissions to that instead. An external location combines a storage credential with a specific path, and authorizes access only to that path and its contents.
You can manage permissions for a storage credential using Data Explorer, the Databricks CLI, SQL commands in a notebook or Databricks SQL query, or Terraform. You can grant and revoke the following permissions on a storage credential:
CREATE TABLE
READ FILES
WRITE FILES
In the following examples, replace the placeholder values:
<principal>
: The email address of the account-level user or the name of the account level group to whom to grant the permission.<storage_credential_name>
: The name of a storage credential.
To show grants on a storage credential, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.
SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>;
display(spark.sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>"))
library(SparkR)
display(sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>"))
display(spark.sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>"))
To grant permission to create an external table using a storage credential directly:
GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>;
spark.sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
library(SparkR)
sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
spark.sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
To grant permission to select from an external table using a storage credential directly:
GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>;
spark.sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
library(SparkR)
sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
spark.sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
Note
If a group name contains a space, use back-ticks around it (not apostrophes).
Change the owner of a storage credential
A storage credential’s creator is its initial owner. To change the owner to a different account-level user or group, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<credential_name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>;
Run the following command in a notebook. Replace the placeholder values:
<credential_name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
spark.sql("ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>")
Run the following command in a notebook. Replace the placeholder values:
<credential_name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
library(SparkR)
sql("ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>")
Run the following command in a notebook. Replace the placeholder values:
<credential_name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
spark.sql("ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>")
Delete a storage credential
To delete (drop) a storage credential you must be its owner. To delete a storage credential, you can use Data Explorer or a SQL command.
Log in to a workspace that is linked to the metastore.
Click
Data.
At the bottom of the screen, click Storage Credentials.
Click the name of a storage credential to open the edit dialog.
Click the Delete button.
Run the following command in a notebook or the Databricks SQL editor. Replace <credential_name>
with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name>
with the name of the credential.
IF EXISTS
does not return an error if the credential does not exist.
DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>;
Run the following command in a notebook. Replace <credential_name>
with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name>
with the name of the credential.
IF EXISTS
does not return an error if the credential does not exist.<credential_name>
: The name of the credential.
spark.sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>")
Run the following command in a notebook. Replace <credential_name>
with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name>
with the name of the credential.
IF EXISTS
does not return an error if the credential does not exist.library(SparkR) sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>")
Run the following command in a notebook. Replace <credential_name>
with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name>
with the name of the credential.
IF EXISTS
does not return an error if the credential does not exist.spark.sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>")
Manage external locations
The following sections illustrate how to create and manage external locations.
Create an external location
You can create an external location using Data Explorer, the Databricks CLI, SQL commands in a notebook or Databricks SQL query, or Terraform.
Run the following SQL command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location_name>
: A name for the external location.<bucket_path>
: The path in your cloud tenant that this external location grants access to.
<storage_credential_name>
: The name of the storage credential that contains the IAM role ARN that authorizes reading from and writing to the S3 bucket.
Note
Each cloud storage path can be associated with only one external location. If you attempt to create a second external location that references the same path, the command fails.
CREATE EXTERNAL LOCATION [IF NOT EXISTS] <location_name>
URL 's3://<bucket_path>'
WITH ([STORAGE] CREDENTIAL <storage_credential_name>)
[COMMENT <comment_string>];
spark.sql("CREATE EXTERNAL LOCATION [IF NOT EXISTS] <location_name> "
"URL 's3://<bucket_path>' "
"WITH ([STORAGE] CREDENTIAL <storage_credential_name>) "
"[COMMENT <comment_string>]")
library(SparkR)
sql(paste("CREATE EXTERNAL LOCATION [IF NOT EXISTS] <location_name> ",
"URL 's3://<bucket_path>' ",
"WITH ([STORAGE] CREDENTIAL <storage_credential_name>) ",
"[COMMENT <comment_string>]",
sep = ""))
spark.sql("CREATE EXTERNAL LOCATION [IF NOT EXISTS] <location_name> " +
"URL 's3://<bucket_path>' " +
"WITH ([STORAGE] CREDENTIAL <storage_credential_name>) " +
"[COMMENT <comment_string>]")
Describe an external location
To see the properties of an external location, you can use Data Explorer or a SQL command.
Log in to a workspace that is linked to the metastore.
Click
Data.
At the bottom of the screen, click External Locations.
Click the name of an external location to see its properties.
Run the following command in a notebook or the Databricks SQL editor. Replace <credential_name>
with the name of the credential.
DESCRIBE EXTERNAL LOCATION <location_name>;
Run the following command in a notebook. Replace <credential_name>
with the name of the credential.
display(spark.sql("DESCRIBE EXTERNAL LOCATION <location_name>"))
Run the following command in a notebook. Replace <credential_name>
with the name of the credential.
library(SparkR)
display(sql("DESCRIBE EXTERNAL LOCATION <location_name>"))
Run the following command in a notebook. Replace <credential_name>
with the name of the credential.
display(spark.sql("DESCRIBE EXTERNAL LOCATION <location_name>"))
Modify an external location
An external location’s owner can rename, change the URI, and change the storage credential of the external location.
To rename an external location, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location_name>
: The name of the location.<new_location_name>
: A new name for the location.
ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>;
Run the following command in a notebook. Replace the placeholder values:
<location_name>
: The name of the location.<new_location_name>
: A new name for the location.
spark.sql("ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>")
Run the following command in a notebook. Replace the placeholder values:
<location_name>
: The name of the location.<new_location_name>
: A new name for the location.
library(SparkR)
sql("ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>")
Run the following command in a notebook. Replace the placeholder values:
<location_name>
: The name of the location.<new_location_name>
: A new name for the location.
spark.sql("ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>")
To change the URI that an external location points to in your cloud tenant, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location_name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE];
Run the following command in a notebook. Replace the placeholder values:
<location_name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE]")
Run the following command in a notebook. Replace the placeholder values:
<location_name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
library(SparkR)
sql("ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE]")
Run the following command in a notebook. Replace the placeholder values:
<location_name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE]")
The FORCE
option changes the URL even if external tables depend upon the external location.
To change the storage credential that an external location uses, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location_name>
: The name of the external location.<credential_name>
: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>;
Run the following command in a notebook. Replace the placeholder values:
<location_name>
: The name of the external location.<credential_name>
: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>")
Run the following command in a notebook. Replace the placeholder values:
<location_name>
: The name of the external location.<credential_name>
: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
library(SparkR)
sql("ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>")
Run the following command in a notebook. Replace the placeholder values:
<location_name>
: The name of the external location.<credential_name>
: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>")
Manage permissions for an external location
You can grant and revoke the following permissions on an external location using Data Explorer, the Databricks CLI, SQL commands in a notebook or Databricks SQL query, or Terraform:
CREATE TABLE
READ FILES
WRITE FILES
In the following examples, replace the placeholder values:
<location_name>
: The name of the external location that authorizes reading from and writing to the S3 bucket in your cloud tenant.<principal>
: The email address of an account-level user or the name of an account-level group.
To show grants on an external location, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.
SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>;
display(spark.sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>"))
library(SparkR)
display(sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>"))
display(spark.sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>"))
To grant permission to use an external location to create a table:
GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>;
spark.sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>")
library(SparkR)
sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>")
spark.sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>")
To grant permission to read files from an external location:
GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>;
spark.sql("GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>")
library(SparkR)
sql("GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>")
spark.sql("GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>")
Note
If a group name contains a space, use back-ticks around it (not apostrophes).
Change the owner of an external location
An external location’s creator is its initial owner. To change the owner to a different account-level user or group, run the following command in a notebook or the Databricks SQL editor or use Data Explorer. Replace the placeholder values:
<location_name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
ALTER EXTERNAL LOCATION <location_name> OWNER TO <principal>
Delete an external location
To delete (drop) an external location you must be its owner. To delete an external location, do the following:
Run the following command in a notebook or the Databricks SQL editor. Items in brackets are optional. Replace <location_name>
with the name of the external location.
DROP EXTERNAL LOCATION [IF EXISTS] <location_name>;
Run the following command in a notebook. Items in brackets are optional. Replace <location_name>
with the name of the external location.
spark.sql("DROP EXTERNAL LOCATION [IF EXISTS] <location_name>")
Run the following command in a notebook. Items in brackets are optional. Replace <location_name>
with the name of the external location.
library(SparkR)
sql("DROP EXTERNAL LOCATION [IF EXISTS] <location_name>")
Run the following command in a notebook. Items in brackets are optional. Replace <location_name>
with the name of the external location.
spark.sql("DROP EXTERNAL LOCATION [IF EXISTS] <location_name>")