Manage external locations and storage credentials

This article introduces external locations and storage credentials and explains how to create and use them to manage access to external tables.

What are external locations and storage credentials?

External locations and storage credentials allow Unity Catalog to read and write data on your cloud tenant on behalf of users. These objects are used for:

A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant, using an IAM role. Each storage credential is subject to Unity Catalog access-control policies that control which users and groups can access the credential. If a user does not have access to a storage credential in Unity Catalog, the request fails and Unity Catalog does not attempt to authenticate to your cloud tenant on the user’s behalf.

An external location is an object that combines a cloud storage path with a storage credential that authorizes access to the cloud storage path. Each storage location is subject to Unity Catalog access-control policies that control which users and groups can access the credential. If a user does not have access to a storage location in Unity Catalog, the request fails and Unity Catalog does not attempt to authenticate to your cloud tenant on the user’s behalf.

Databricks recommends using external locations rather than using storage credentials directly.

Requirements

  • To create storage credentials, you must be a Databricks account admin. The account admin who creates the storage credential can delegate ownership to another user or group to manage permissions on it.

  • To create external locations, you must be a metastore admin or a user with the CREATE EXTERNAL LOCATION privilege.

  • The name of the S3 bucket that you want users to read from and write to cannot use dot notation (for example, incorrect.bucket.name.notation). For more bucket naming guidance, see the AWS bucket naming rules.

Manage storage credentials

The following sections show how to create and manage storage credentials.

Create a storage credential

To create a storage credential, you need an IAM role that authorizes reading from and writing to an S3 bucket path. You reference that IAM role when you create the storage credential.

Step 1: Create or update an IAM role

In AWS, create or update an IAM role that gives access to the S3 bucket that you want your users to access. This IAM role must be defined in the same account as the S3 bucket.

Tip

If you have already created an IAM role to provide this access, you can skip this step and go straight to Step 2: Give Databricks the IAM role details.

  1. Create an IAM role or update an existing role.

    Set up a cross-account trust relationship so that Unity Catalog can assume the role to access data in the bucket on behalf of Databricks users by pasting the following policy JSON into the Trust Relationship tab.

    • Do not modify the role ARN in the Principal section, which is a static value that references a role created by Databricks.

    • In the sts:ExternalId section, replace <DATABRICKS_ACCOUNT_ID> with your Databricks account ID (not your AWS account ID). To get the Databricks account ID, see step 1 of Configure a storage bucket and IAM role in AWS.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL"
          },
          "Action": "sts:AssumeRole",
          "Condition": {
            "StringEquals": {
              "sts:ExternalId": "<DATABRICKS_ACCOUNT_ID>"
            }
          }
        }
      ]
    }
    
  2. Create the following IAM policy in the same account as the S3 bucket, replacing the following values:

    • <BUCKET>: The name of the S3 bucket.

    • <KMS_KEY>: Optional. If encryption is enabled, provide the name of the KMS key that encrypts the S3 bucket contents. If encryption is disabled, remove the entire KMS section of the IAM policy.

    • <AWS_ACCOUNT_ID>: The Account ID of your AWS account (not your Databricks account).

    • <AWS_IAM_ROLE_NAME>: The name of the AWS IAM role that you created in the previous step.

    {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Action": [
                  "s3:GetObject",
                  "s3:PutObject",
                  "s3:DeleteObject",
                  "s3:ListBucket",
                  "s3:GetBucketLocation",
                  "s3:GetLifecycleConfiguration",
                  "s3:PutLifecycleConfiguration"
              ],
              "Resource": [
                  "arn:aws:s3:::<BUCKET>/*",
                  "arn:aws:s3:::<BUCKET>"
              ],
              "Effect": "Allow"
          },
          {
              "Action": [
                  "kms:Decrypt",
                  "kms:Encrypt",
                  "kms:GenerateDataKey*"
              ],
              "Resource": [
                  "arn:aws:kms:<KMS_KEY>"
              ],
              "Effect": "Allow"
          },
          {
              "Action": [
                  "sts:AssumeRole"
              ],
              "Resource": [
                  "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<AWS_IAM_ROLE_NAME>"
              ],
              "Effect": "Allow"
          }
        ]
    }
    

    Note

    If you need a more restrictive IAM policy for Unity Catalog, contact your Databricks representative for assistance.

  3. Attach the IAM policy to the IAM role.

    In the Role’s Permission tab, attach the IAM Policy you just created.

Step 2: Give Databricks the IAM role details

  1. In Databricks, log in to a workspace that is linked to the metastore.

  2. Click Data Icon Data.

  3. At the bottom of the screen, click Storage Credentials.

  4. Click the + menu at the upper right and select Add a storage credential.

  5. Enter a name for the credential, the IAM Role ARN that authorizes Unity Catalog to access the storage location on your cloud tenant, and an optional comment.

    Tip

    If you have already defined an instance profile in Databricks, you can click Copy instance profile to copy over the IAM role ARN for that instance profile. The instance profile’s IAM role must have a cross-account trust relationship that enables Databricks to assume the role in order to access the bucket on behalf of Databricks users. For more information about the IAM role policy and trust relationship requirements, see Step 1: Create or update an IAM role.

  6. Click Save.

  7. Create an external location that references this storage credential.

You can also create a storage credential by using Databricks Terraform provider and databricks_storage_credential.

List storage credentials

To view the list of all storage credentials in a metastore, you can use Data Explorer or a SQL command.

  1. Log in to a workspace that is linked to the metastore.

  2. Click Data Icon Data.

  3. At the bottom of the screen, click Storage Credentials.

Run the following command in a notebook or the Databricks SQL editor.

SHOW STORAGE CREDENTIALS;

Run the following command in a notebook.

display(spark.sql("SHOW STORAGE CREDENTIALS"))

Run the following command in a notebook.

library(SparkR)

display(sql("SHOW STORAGE CREDENTIALS"))

Run the following command in a notebook.

display(spark.sql("SHOW STORAGE CREDENTIALS"))

View a storage credential

To view the properties of a storage credential, you can use Data Explorer or a SQL command.

  1. Log in to a workspace that is linked to the metastore.

  2. Click Data Icon Data.

  3. At the bottom of the screen, click Storage Credentials.

  4. Click the name of a storage credential to see its properties.

Run the following command in a notebook or the Databricks SQL editor. Replace <credential_name> with the name of the credential.

DESCRIBE STORAGE CREDENTIAL <credential_name>;

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

display(spark.sql("DESCRIBE STORAGE CREDENTIAL <credential_name>"))

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

library(SparkR)

display(sql("DESCRIBE STORAGE CREDENTIAL <credential_name>"))

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

display(spark.sql("DESCRIBE STORAGE CREDENTIAL <credential_name>"))

Rename a storage credential

To rename a storage credential, you can use Data Explorer or a SQL command.

  1. Log in to a workspace that is linked to the metastore.

  2. Click Data Icon Data.

  3. At the bottom of the screen, click Storage Credentials.

  4. Click the name of a storage credential to open the edit dialog.

  5. Rename the storage credential and save it.

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <credential_name>: The name of the credential.

  • <new_credential_name>: A new name for the credential.

ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>;

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.

  • <new_credential_name>: A new name for the credential.

spark.sql("ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>")

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.

  • <new_credential_name>: A new name for the credential.

library(SparkR)

sql("ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>")

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.

  • <new_credential_name>: A new name for the credential.

spark.sql("ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>")

Manage permissions for a storage credential

You can grant permissions directly on the storage credential, but Databricks recommends that you reference it in an external location and grant permissions to that instead. An external location combines a storage credential with a specific path, and authorizes access only to that path and its contents.

You can manage permissions for a storage credential using Data Explorer, SQL commands, or Terraform. You can grant and revoke the following permissions on a storage credential:

  • CREATE TABLE

  • READ FILES

  • WRITE FILES

In the following examples, replace the placeholder values:

  • <principal>: The email address of the account-level user or the name of the account level group to whom to grant the permission.

  • <storage_credential_name>: The name of a storage credential.

To show grants on a storage credential, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.

SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>;
display(spark.sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>"))
library(SparkR)
display(sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>"))
display(spark.sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>"))

To grant permission to create an external table using a storage credential directly:

GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>;
spark.sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
library(SparkR)
sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
spark.sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")

To grant permission to select from an external table using a storage credential directly:

GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>;
spark.sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
library(SparkR)
sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")
spark.sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")

Note

If a group name contains a space, use back-ticks around it (not apostrophes).

Change the owner of a storage credential

A storage credential’s creator is its initial owner. To change the owner to a different account-level user or group, do the following:

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <credential_name>: The name of the credential.

  • <principal>: The email address of an account-level user or the name of an account-level group.

ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>;

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.

  • <principal>: The email address of an account-level user or the name of an account-level group.

spark.sql("ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>")

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.

  • <principal>: The email address of an account-level user or the name of an account-level group.

library(SparkR)

sql("ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>")

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.

  • <principal>: The email address of an account-level user or the name of an account-level group.

spark.sql("ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>")

Delete a storage credential

To delete a storage credential, you can use Data Explorer or a SQL command.

  1. Log in to a workspace that is linked to the metastore.

  2. Click Data Icon Data.

  3. At the bottom of the screen, click Storage Credentials.

  4. Click the name of a storage credential to open the edit dialog.

  5. Click the Delete button.

Run the following command in a notebook or the Databricks SQL editor. Replace <credential_name> with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name> with the name of the credential.

  • IF EXISTS does not return an error if the credential does not exist.

DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>;

Run the following command in a notebook. Replace <credential_name> with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name> with the name of the credential.

  • IF EXISTS does not return an error if the credential does not exist.

  • <credential_name>: The name of the credential.

    • <principal>: The email address of an account-level user or the name of an account-level group.

    spark.sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>")
    

Run the following command in a notebook. Replace <credential_name> with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name> with the name of the credential.

  • IF EXISTS does not return an error if the credential does not exist.

 library(SparkR)

 sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>")

Run the following command in a notebook. Replace <credential_name> with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name> with the name of the credential.

  • IF EXISTS does not return an error if the credential does not exist.

 spark.sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>")

Manage external locations

The following sections illustrate how to create and manage external locations.

Create an external location

You can create an external location using Data Explorer, a SQL command, or Terraform.

Run the following SQL command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <location_name>: A name for the external location.

  • <bucket_path>: The path in your cloud tenant that this external location grants access to.

  • <storage_credential_name>: The name of the storage credential that contains the IAM role ARN that authorizes reading from and writing to the S3 bucket.

Note

Each cloud storage path can be associated with only one external location. If you attempt to create a second external location that references the same path, the command fails.

CREATE EXTERNAL LOCATION [IF NOT EXISTS] <location_name>
URL 's3://<bucket_path>'
WITH ([STORAGE] CREDENTIAL <storage_credential_name>)
[COMMENT <comment_string>];
spark.sql("CREATE EXTERNAL LOCATION [IF NOT EXISTS] <location_name> "
  "URL 's3://<bucket_path>' "
  "WITH ([STORAGE] CREDENTIAL <storage_credential_name>) "
  "[COMMENT <comment_string>]")
library(SparkR)
sql(paste("CREATE EXTERNAL LOCATION [IF NOT EXISTS] <location_name> ",
  "URL 's3://<bucket_path>' ",
  "WITH ([STORAGE] CREDENTIAL <storage_credential_name>) ",
  "[COMMENT <comment_string>]",
  sep = ""))
spark.sql("CREATE EXTERNAL LOCATION [IF NOT EXISTS] <location_name> " +
  "URL 's3://<bucket_path>' " +
  "WITH ([STORAGE] CREDENTIAL <storage_credential_name>) " +
  "[COMMENT <comment_string>]")

Describe an external location

To see the properties of an external location, you can use Data Explorer or a SQL command.

  1. Log in to a workspace that is linked to the metastore.

  2. Click Data Icon Data.

  3. At the bottom of the screen, click External Locations.

  4. Click the name of an external location to see its properties.

Run the following command in a notebook or the Databricks SQL editor. Replace <credential_name> with the name of the credential.

DESCRIBE EXTERNAL LOCATION <location_name>;

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

display(spark.sql("DESCRIBE EXTERNAL LOCATION <location_name>"))

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

library(SparkR)

display(sql("DESCRIBE EXTERNAL LOCATION <location_name>"))

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

display(spark.sql("DESCRIBE EXTERNAL LOCATION <location_name>"))

Modify an external location

An external location’s owner can rename, change the URI, and change the storage credential of the external location.

To rename an external location, do the following:

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <location_name>: The name of the location.

  • <new_location_name>: A new name for the location.

ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>;

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the location.

  • <new_location_name>: A new name for the location.

spark.sql("ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>")

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the location.

  • <new_location_name>: A new name for the location.

library(SparkR)

sql("ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>")

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the location.

  • <new_location_name>: A new name for the location.

spark.sql("ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>")

To change the URI that an external location points to in your cloud tenant, do the following:

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <location_name>: The name of the external location.

  • <url>: The new storage URL the location should authorize access to in your cloud tenant.

ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE];

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.

  • <url>: The new storage URL the location should authorize access to in your cloud tenant.

spark.sql("ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE]")

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.

  • <url>: The new storage URL the location should authorize access to in your cloud tenant.

library(SparkR)

sql("ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE]")

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.

  • <url>: The new storage URL the location should authorize access to in your cloud tenant.

spark.sql("ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE]")

The FORCE option changes the URL even if external tables depend upon the external location.

To change the storage credential that an external location uses, do the following:

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <location_name>: The name of the external location.

  • <credential_name>: The name of the storage credential that grants access to the location’s URL in your cloud tenant.

ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>;

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.

  • <credential_name>: The name of the storage credential that grants access to the location’s URL in your cloud tenant.

spark.sql("ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>")

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.

  • <credential_name>: The name of the storage credential that grants access to the location’s URL in your cloud tenant.

library(SparkR)

sql("ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>")

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.

  • <credential_name>: The name of the storage credential that grants access to the location’s URL in your cloud tenant.

spark.sql("ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>")

Manage permissions for an external location

You can grant and revoke the following permissions on an external location using Data Explorer, a SQL command, or Terraform:

  • CREATE TABLE

  • READ FILES

  • WRITE FILES

In the following examples, replace the placeholder values:

  • <principal>: The email address of the account-level user or the name of the account level group to whom to grant the permission.

  • <location_name>: The name of the external location that authorizes reading from and writing to the S3 bucket in your cloud tenant.

  • <principal>: The email address of an account-level user or the name of an account-level group.

To show grants on an external location, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.

SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>;
display(spark.sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>"))
library(SparkR)

display(sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>"))
display(spark.sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>"))

To grant permission to use an external location to create a table:

GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>;
spark.sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>")
library(SparkR)

sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>")
spark.sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>")

To grant permission to read files from an external location:

GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>;
spark.sql("GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>")
library(SparkR)

sql("GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>")
spark.sql("GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>")

Note

If a group name contains a space, use back-ticks around it (not apostrophes).

Change the owner of an external location

An external location’s creator is its initial owner. To change the owner to a different account-level user or group, run the following command in a notebook or the Databricks SQL editor or use Data Explorer. Replace the placeholder values:

  • <location_name>: The name of the credential.

  • <principal>: The email address of an account-level user or the name of an account-level group.

ALTER EXTERNAL LOCATION <location_name> OWNER TO <principal>

Delete an external location

To delete an external location, do the following:

Run the following command in a notebook or the Databricks SQL editor. Items in brackets are optional. Replace <location_name> with the name of the external location.

DROP EXTERNAL LOCATION [IF EXISTS] <location_name>;

Run the following command in a notebook. Items in brackets are optional. Replace <location_name> with the name of the external location.

spark.sql("DROP EXTERNAL LOCATION [IF EXISTS] <location_name>")

Run the following command in a notebook. Items in brackets are optional. Replace <location_name> with the name of the external location.

library(SparkR)

sql("DROP EXTERNAL LOCATION [IF EXISTS] <location_name>")

Run the following command in a notebook. Items in brackets are optional. Replace <location_name> with the name of the external location.

spark.sql("DROP EXTERNAL LOCATION [IF EXISTS] <location_name>")