Skip to main content

Using Catalog Explorer or SQL to connect to AWS S3

This page describes how to connect to an AWS S3 bucket by first creating a storage credential object, then creating an external location object using Catalog Explorer or SQL. You must create the storage credential before creating the external location.

For a more automated setup process, Databricks recommends using the AWS CloudFormation Quickstart template to create both of these objects.

Before you begin

Prerequisites:

You must create the S3 bucket that you want to use as an external location in AWS before you create the external location object in Databricks.

  • Do not use dot notation (for example, incorrect.bucket.name.notation) in S3 bucket names. Although AWS allows dots in bucket names, Databricks does not support S3 buckets with dot notation. Buckets containing dots can cause compatibility issues with features like Delta Sharing due to SSL certificate validation failures. For more information, see the AWS bucket naming best practices.

  • External location paths must contain only standard ASCII characters (letters A–Z, a–z, digits 0–9, and common symbols like /, _, -).

  • The bucket cannot have an S3 access control list attached to it.

  • Avoid using a path in S3 that is already defined as an external location in another Unity Catalog metastore. You can safely read data in a single external S3 location from more than one metastore, but concurrent writes to the same S3 location from multiple metastores can lead to consistency issues.

Databricks permissions requirements:

  • You must have the CREATE STORAGE CREDENTIAL privilege on the metastore. Metastore admins have CREATE STORAGE CREDENTIAL on the metastore by default.
  • You must have the CREATE EXTERNAL LOCATION privilege on both the metastore and the storage credential referenced in the external location. Metastore admins have CREATE EXTERNAL LOCATION on the metastore by default.

AWS permissions requirements:

  • You must have iam:CreateRole permissions to create the IAM role.

Create a storage credential that accesses an AWS S3 bucket

Before creating an external location, you need a storage credential that allows access to the S3 bucket. If you already have a storage credential, you can skip to Create an external location for an AWS S3 bucket.

Step 1: Create an IAM role

In AWS, create an IAM role that gives access to the S3 bucket that you want your users to access. This IAM role must be defined in the same account as the S3 bucket.

tip

If you have already created an IAM role that provides this access, you can skip this step and go straight to Step 2: Give Databricks the IAM role details.

  1. Create an IAM role that will allow access to the S3 bucket.

    Role creation is a two-step process. In this step you create the role, adding a temporary trust relationship policy and a placeholder external ID that you then modify after creating the storage credential in Databricks.

    You must modify the trust policy after you create the role because your role must be self-assuming (that is, it must be configured to trust itself). The role must therefore exist before you add the self-assumption statement. For information about self-assuming roles, see this Amazon blog article.

    important

    Databricks blocks new and existing storage credentials based on IAM roles that are not self-assuming. For details, see Self-assuming role enforcement policy.

    To create the policy, you must use a placeholder external ID.

    1. Create the IAM role with a Custom Trust Policy.

    2. In the Custom Trust Policy field, paste the following policy JSON.

      This policy establishes a cross-account trust relationship so that Unity Catalog can assume the role to access the data in the bucket on behalf of Databricks users. This is specified by the ARN in the Principal section. It is a static value that references a role created by Databricks.

      The policy sets the external ID to 0000 as a placeholder. You will update this to the external ID of your storage credential in a later step. The policy is slightly different if you use Databricks on AWS GovCloud (FedRAMP High).

      JSON
      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Effect": "Allow",
      "Principal": {
      "AWS": ["arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL"]
      },
      "Action": "sts:AssumeRole",
      "Condition": {
      "StringEquals": {
      "sts:ExternalId": "0000"
      }
      }
      }
      ]
      }
    3. Skip the permissions policy configuration. You'll go back to add that in a later step.

    4. Save the IAM role.

  2. Create the following IAM policy in the same account as the S3 bucket, replacing the following values:

    • <BUCKET>: The name of the S3 bucket.

    • <KMS-KEY>: Optional. If encryption is enabled, provide the name of the KMS key that encrypts the S3 bucket contents. If encryption is disabled, remove the entire KMS section of the IAM policy.

      note

      If encryption is enabled, you must update your KMS key policy in AWS to grant access to the Unity Catalog IAM role. Otherwise, Unity Catalog cannot access data in the bucket. Add the following to the key policy, replacing the placeholders with your values:

      JSON
      "Principal": {
      "AWS": "arn:aws:iam::<AWS-ACCOUNT-ID>:role/<UNITY-CATALOG-IAM-ROLE>"
      }

      See Configure encryption for S3 using Unity Catalog.

    • <AWS-ACCOUNT-ID>: The Account ID of your AWS account (not your Databricks account).

    • <AWS-IAM-ROLE-NAME>: The name of the AWS IAM role that you created in the previous step.

    This IAM policy grants read and write access. You can also create a policy that grants only read access. However, this might be unnecessary because you can mark the storage credential as read-only, and Databricks ignores any write access granted by this IAM role.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Action": [
    "s3:GetObject",
    "s3:PutObject",
    "s3:DeleteObject",
    "s3:ListBucket",
    "s3:GetBucketLocation",
    "s3:ListBucketMultipartUploads",
    "s3:ListMultipartUploadParts",
    "s3:AbortMultipartUpload"
    ],
    "Resource": ["arn:aws:s3:::<BUCKET>/*", "arn:aws:s3:::<BUCKET>"],
    "Effect": "Allow"
    },
    {
    "Action": ["kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*"],
    "Resource": ["arn:aws:kms:<KMS-KEY>"],
    "Effect": "Allow"
    },
    {
    "Action": ["sts:AssumeRole"],
    "Resource": ["arn:aws:iam::<AWS-ACCOUNT-ID>:role/<AWS-IAM-ROLE-NAME>"],
    "Effect": "Allow"
    }
    ]
    }
    note

    If you need a more restrictive IAM policy for Unity Catalog, contact your Databricks account team for assistance.

  3. Create an IAM policy for file events in the same account as the S3 bucket.

    note

    This step is optional but highly recommended. If you do not grant Databricks access to configure file events on your behalf, you must configure file events manually for each location. If you do not, you will have limited access to critical features that Databricks may release in the future. For more information about file events, see (Recommended) Enable file events for an external location.

    The IAM policy grants Databricks permission to update your bucket's event notification configuration, create an SNS topic, create an SQS queue, and subscribe the SQS queue to the SNS topic. These are required resources for features that use file events. Replace <BUCKET> with the name of the S3 bucket.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "ManagedFileEventsSetupStatement",
    "Effect": "Allow",
    "Action": [
    "s3:GetBucketNotification",
    "s3:PutBucketNotification",
    "sns:ListSubscriptionsByTopic",
    "sns:GetTopicAttributes",
    "sns:SetTopicAttributes",
    "sns:CreateTopic",
    "sns:TagResource",
    "sns:Publish",
    "sns:Subscribe",
    "sqs:CreateQueue",
    "sqs:DeleteMessage",
    "sqs:ReceiveMessage",
    "sqs:SendMessage",
    "sqs:GetQueueUrl",
    "sqs:GetQueueAttributes",
    "sqs:SetQueueAttributes",
    "sqs:TagQueue",
    "sqs:ChangeMessageVisibility",
    "sqs:PurgeQueue"
    ],
    "Resource": ["arn:aws:s3:::<BUCKET>", "arn:aws:sqs:*:*:csms-*", "arn:aws:sns:*:*:csms-*"]
    },
    {
    "Sid": "ManagedFileEventsListStatement",
    "Effect": "Allow",
    "Action": ["sqs:ListQueues", "sqs:ListQueueTags", "sns:ListTopics"],
    "Resource": ["arn:aws:sqs:*:*:csms-*", "arn:aws:sns:*:*:csms-*"]
    },
    {
    "Sid": "ManagedFileEventsTeardownStatement",
    "Effect": "Allow",
    "Action": ["sns:Unsubscribe", "sns:DeleteTopic", "sqs:DeleteQueue"],
    "Resource": ["arn:aws:sqs:*:*:csms-*", "arn:aws:sns:*:*:csms-*"]
    }
    ]
    }
  4. Attach the IAM policies to the IAM role.

    In the role's Permission tab, attach the IAM policies that you just created.

Step 2: Give Databricks the IAM role details

  1. In Databricks, log in to a workspace that is linked to the Unity Catalog metastore.

    You must have the CREATE STORAGE CREDENTIAL privilege. The metastore admin and account admin roles both include this privilege.

  2. Click Data icon. Catalog to open Catalog Explorer.

  3. Click Add or plus icon, then click Create a credential.

  4. Select a Credential Type of AWS IAM Role.

  5. Enter a name for the credential, the IAM Role ARN that authorizes Unity Catalog to access the storage location on your cloud tenant, and an optional comment.

    tip

    If you already defined an instance profile in Databricks, click Copy instance profile to copy over the IAM role ARN for that instance profile. The instance profile's IAM role must have a cross-account trust relationship that enables Databricks to assume the role in order to access the bucket on behalf of Databricks users. For more information about the IAM role policy and trust relationship requirements, see Step 1: Create an IAM role.

  6. (Optional) If you want users to have read-only access to the external locations that use this storage credential, in Advanced options select Limit to read-only use. You can change this setting later. For more information, see Mark a storage credential as read-only.

  7. Click Create.

  8. In the Storage credential created dialog, copy the External ID.

  9. Click Done.

  10. (Optional) Bind the storage credential to specific workspaces.

    By default, any privileged user can use the storage credential on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See Assign a storage credential to specific workspaces.

Step 3: Update the IAM role trust relationship policy

In AWS, modify the trust relationship policy to add your storage credential's external ID and make it self-assuming.

  1. Return to your saved IAM role and go to the Trust Relationships tab.

  2. Edit the trust relationship policy to look like the following:

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": [
    "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL",
    "arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/<THIS-ROLE-NAME>"
    ]
    },
    "Action": "sts:AssumeRole",
    "Condition": {
    "StringEquals": {
    "sts:ExternalId": "<STORAGE-CREDENTIAL-EXTERNAL-ID>"
    }
    }
    }
    ]
    }

    In this policy, replace the following values:

    • <YOUR-AWS-ACCOUNT-ID>: Your AWS account ID.
    • <THIS-ROLE-NAME>: The name of this IAM role.
    • <STORAGE-CREDENTIAL-EXTERNAL-ID>: The storage credential's external ID that you copied in the previous step.

Step 4: Validate the storage credential

After you have made the changes to the IAM role trust policy in Step 3: Update the IAM role trust relationship policy, verify that your IAM role is properly configured to be used as a storage credential.

note

To validate the configuration, you must be the storage credential owner, a metastore admin, or you have CREATE EXTERNAL LOCATION permissions on the storage credential.

  1. In Databricks, log in to a workspace that is linked to the metastore.

  2. Click Data icon. Catalog.

  3. On the Quick access page, click the External Data > button and go to the Credentials tab.

    As an alternative, you can click the Gear icon. gear icon at the top of the Catalog pane and select Credentials.

  4. Select the storage credential that you want to validate.

  5. Click Validate Configuration.

  6. If any of the checks fail, return to Step 3: Update the IAM role trust relationship policy and review the IAM role's trust policy to configure them correctly.

After the storage credential is validated, you can use it to create an external location.

Self-assuming role enforcement policy

On June 30, 2023, AWS updated its IAM role trust policy to require that IAM roles explicitly self-trust for STS:AssumeRole calls. As a result, Databricks requires that AWS IAM roles for storage credentials be self-assuming. For details, see this community blog post.

On January 20, 2025, Databricks began blocking the usage of existing storage credentials with non-self-assuming IAM roles. This prohibition can break workloads and jobs that run using non-self-assuming credentials.

To check whether an AWS IAM role for a storage credential is self-assuming, follow the instructions in Step 4: Validate the storage credential. If the Self Assume Role check fails, revisit Step 3: Update the IAM role trust relationship policy and reconfigure the IAM role's trust policy to trust itself.

If you have multiple storage credentials in a metastore that you want to check, use the following notebook to verify the self-assuming capabilities of all storage credentials in your metastore:

Self-assuming storage credential verification notebook

Open notebook in new tab

:::::

Create an external location for an AWS S3 bucket

This section describes how to create an external location using either Catalog Explorer or SQL. It assumes that you already have a storage credential that allows access to the S3 bucket.

If you don't already have a storage credential, Databricks recommends using the AWS CloudFormation Quickstart template to create both the storage credential and the external location. To manually create a storage credential instead, see Create a storage credential that accesses an AWS S3 bucket.

Option 1: Create an external location using Catalog Explorer

Use Catalog Explorer to manually create an external location if you prefer working within the UI. To create external locations programmatically, use SQL.

To create the external location:

  1. Log in to a workspace that is attached to the metastore.

  2. In the sidebar, click Data icon. Catalog.

  3. On the Quick access page, click the External data > button, go to the External Locations tab, and click Create external location.

  4. On the Create a new external location dialog, click Manual, then Next.

    To learn about the AWS Quickstart option, see Using the AWS CloudFormation Quickstart template to connect to AWS S3.

  5. In the Create a new external location manually dialog, enter an External location name.

  6. Under Storage type, select S3.

  7. Under URL, enter the S3 bucket path. For example, s3://<bucket-path>.

  8. Under Storage credential, select the storage credential that grants access to the external location.

    note

    If you don't have a storage credential object, but you do have an AWS IAM role that allows access to the S3 bucket, you can select + Create new storage credential to create the storage credential object within this dialog.

  9. (Optional) If you want users to have read-only access to the external location, click Advanced Options and select Limit to read-only use. You can change this setting later. For more information, see Mark an external location as read-only.

  10. (Optional) If the external location is intended for legacy workload migration, click Advanced options and enable Fallback mode.

    See Enable fallback mode on external locations.

  11. (Optional) If the S3 bucket requires SSE encryption, you can configure an encryption algorithm to allow external tables and volumes in Unity Catalog to access data in your S3 bucket.

    For instructions, see Configure an encryption algorithm on an external location (AWS S3 only).

  12. (Optional) To subscribe to change notifications on the external location, click Advanced Options and select Enable file events.

    File events simplify setup and improve performance and capacity for features such as file arrival triggers and Auto Loader file notifications. This step is optional but highly recommended.

    For details, see (Recommended) Enable file events for an external location.

  13. Click Create.

  14. (Optional) Bind the external location to specific workspaces.

    By default, any privileged user can use the external location on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See Assign an external location to specific workspaces.

  15. Go to the Permissions tab to grant permission to use the external location.

    For anyone to use the external location you must grant permissions:

    • To use the external location to add a managed storage location to metastore, catalog, or schema, grant the CREATE MANAGED LOCATION privilege.

    • To create external tables or volumes, grant CREATE EXTERNAL TABLE or CREATE EXTERNAL VOLUME.

    1. Click Grant.
    2. On the Grant on <external location> dialog, select users, groups, or service principals in Principals field, and select the privilege you want to grant.
    3. Click Grant.

Option 2: Create an external location using SQL

To create an external location using SQL, run the following command in a notebook or the SQL query editor. Replace the placeholder values.

  • <location-name>: A name for the external location. If location_name includes special characters, such as hyphens (-), it must be surrounded by backticks (` `). See Names.
  • <bucket-path>: The path in your cloud tenant that this external location grants access to. For example, s3://mybucket.
  • <storage-credential-name>: The name of the storage credential that authorizes reading from and writing to the bucket. If the storage credential name includes special characters, such as hyphens (-), it must be surrounded by backticks (` `).
SQL
CREATE EXTERNAL LOCATION [IF NOT EXISTS] `<location-name>`
URL '<bucket-path>'
WITH ([STORAGE] CREDENTIAL `<storage-credential-name>`)
[COMMENT '<comment-string>'];

If you want to limit external location access to specific workspaces in your account, also known as workspace binding or external location isolation, see Assign an external location to specific workspaces.

Next steps