Skip to main content

Create a storage credential for connecting to AWS S3 (read-only)

This article describes how to create storage credentials in Unity Catalog to connect to AWS S3. Support for S3 in Databricks on Google Cloud is read-only.

A storage credential contains a long-term cloud credential with access to cloud storage. You reference a storage credential and the cloud storage path when you create external locations in Unity Catalog to govern access to external storage.

For more information about storage credentials and external locations, see Connect to cloud object storage using Unity Catalog.

For information about other cloud storage options supported by Unity Catalog, see Cloud storage options supported by Unity Catalog.

note

Cross-cloud data transfer charges might apply. When you use serverless compute, you are billed in accordance with the Databricks data transfer & connectivity policy.

Create the storage credential

To create a storage credential for access to an S3 bucket, you create an AWS IAM role that authorizes access to the S3 bucket path and reference that IAM role in the storage credential definition.

Requirements

In Databricks:

  • Databricks workspace enabled for Unity Catalog.
  • CREATE STORAGE CREDENTIAL privilege on the Unity Catalog metastore attached to the workspace. Account admins and metastore admins have this privilege by default.

In your AWS account:

  • An S3 bucket that meets the following requirements:

    • The bucket name cannot include dot notation (for example, incorrect.bucket.name.notation). For more bucket naming guidance, see the AWS bucket naming rules.
    • The bucket cannot have an S3 access control list attached to it.
  • The ability to create IAM roles.

Step 1: Create an IAM role

In AWS, create an IAM role that gives access to the S3 bucket that you want your users to access. This IAM role must be defined in the same account as the S3 bucket.

tip

If you have already created an IAM role that provides this access, you can skip this step and go straight to Step 2: Give Databricks the IAM role details.

  1. Create an IAM role that allows access to the S3 bucket.

    Role creation is a two-step process. In this step you create the role, adding a temporary trust relationship policy and a placeholder external ID that you then modify after creating the storage credential in Databricks.

    You must modify the trust policy after you create the role, because your role must be self-assuming (that is, it must be configured to trust itself). Because of this, the role must exist before you add the self-assumption statement. For information about self-assuming roles, see this Amazon blog article.

    important

    Databricks blocks new and existing storage credentials based on IAM roles that are not self-assuming. For details, see Self-assuming role enforcement policy.

    To create the policy, you must use a placeholder external ID.

    1. Create the IAM role with a Custom Trust Policy.

    2. In the Custom Trust Policy field, paste the following policy JSON.

      This policy establishes a cross-account trust relationship so that Unity Catalog can assume the role to access the data in the bucket on behalf of Databricks users. This is specified by the ARN in the Principal section. It is a static value that references a role created by Databricks.

      JSON
      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Effect": "Allow",
      "Principal": {
      "AWS": ["arn:aws:iam::414351767826:role/unity-catalog-prod-UCGCPMainRole-1UDE9F2YBJ8MD"]
      },
      "Action": "sts:AssumeRole",
      "Condition": {
      "StringEquals": {
      "sts:ExternalId": "0000"
      }
      }
      }
      ]
      }
    3. Skip the permissions policy configuration. You’ll go back to add that in a later step.

    4. Save the IAM role.

  2. Create the following IAM policy in the same account as the S3 bucket, replacing the following values:

    • <BUCKET>: The name of the S3 bucket.
    • <KMS-KEY>: Optional. If encryption is enabled, provide the name of the KMS key that encrypts the S3 bucket contents. If encryption is disabled, remove the entire KMS section of the IAM policy.
    • <AWS-ACCOUNT-ID>: The Account ID of your AWS account (not your Databricks account).
    • <AWS-IAM-ROLE-NAME>: The name of the AWS IAM role that you created in the previous step.

    This IAM policy grants read and write access. You can also create a policy that grants read access only. However, this may be unnecessary, because you can mark the storage credential as read-only, and any write access granted by this IAM role will be ignored.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Action": [
    "s3:GetObject",
    "s3:PutObject",
    "s3:DeleteObject",
    "s3:ListBucket",
    "s3:GetBucketLocation",
    "s3:ListBucketMultipartUploads",
    "s3:ListMultipartUploadParts",
    "s3:AbortMultipartUpload"
    ],
    "Resource": ["arn:aws:s3:::<BUCKET>/*", "arn:aws:s3:::<BUCKET>"],
    "Effect": "Allow"
    },
    {
    "Action": ["kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*"],
    "Resource": ["arn:aws:kms:<KMS-KEY>"],
    "Effect": "Allow"
    },
    {
    "Action": ["sts:AssumeRole"],
    "Resource": ["arn:aws:iam::<AWS-ACCOUNT-ID>:role/<AWS-IAM-ROLE-NAME>"],
    "Effect": "Allow"
    }
    ]
    }
    note

    If you need a more restrictive IAM policy for Unity Catalog, contact your Databricks account team for assistance.

  3. Create an IAM policy for file events in the same account as the S3 bucket.

    note

    This step is optional but highly recommended. If you do not grant Databricks access to configure file events on your behalf, you must configure file events manually for each location. If you do not, you will have limited access to critical features that Databricks may release in the future. For more information about file events, see (Recommended) Enable file events for an external location.

    The IAM policy grants Databricks permission to update your bucket's event notification configuration, create an SNS topic, create an SQS queue, and subscribe the SQS queue to the SNS topic. These are required resources for features that use file events. Replace <BUCKET> with the name of the S3 bucket.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "ManagedFileEventsSetupStatement",
    "Effect": "Allow",
    "Action": [
    "s3:GetBucketNotification",
    "s3:PutBucketNotification",
    "sns:ListSubscriptionsByTopic",
    "sns:GetTopicAttributes",
    "sns:SetTopicAttributes",
    "sns:CreateTopic",
    "sns:TagResource",
    "sns:Publish",
    "sns:Subscribe",
    "sqs:CreateQueue",
    "sqs:DeleteMessage",
    "sqs:ReceiveMessage",
    "sqs:SendMessage",
    "sqs:GetQueueUrl",
    "sqs:GetQueueAttributes",
    "sqs:SetQueueAttributes",
    "sqs:TagQueue",
    "sqs:ChangeMessageVisibility",
    "sqs:PurgeQueue"
    ],
    "Resource": ["arn:aws:s3:::<BUCKET>", "arn:aws:sqs:*:*:csms-*", "arn:aws:sns:*:*:csms-*"]
    },
    {
    "Sid": "ManagedFileEventsListStatement",
    "Effect": "Allow",
    "Action": ["sqs:ListQueues", "sqs:ListQueueTags", "sns:ListTopics"],
    "Resource": ["arn:aws:sqs:*:*:csms-*", "arn:aws:sns:*:*:csms-*"]
    },
    {
    "Sid": "ManagedFileEventsTeardownStatement",
    "Effect": "Allow",
    "Action": ["sns:Unsubscribe", "sns:DeleteTopic", "sqs:DeleteQueue"],
    "Resource": ["arn:aws:sqs:*:*:csms-*", "arn:aws:sns:*:*:csms-*"]
    }
    ]
    }
  4. Attach the IAM policies to the IAM role.

    In the Role’s Permission tab, attach the IAM policies that you just created.

Step 2: Give Databricks the IAM role details

  1. In Databricks, log in to a workspace that is linked to the Unity Catalog metastore.

    You must have the CREATE STORAGE CREDENTIAL privilege. The metastore admin and account admin roles both include this privilege.

  2. Click Data icon. Catalog.

  3. On the Quick access page, click the External data > button, go to the Credentials tab, and select Create credential.

  4. Select a Credential Type of AWS IAM Role.

  5. Enter a name for the credential, the IAM Role ARN that authorizes Unity Catalog to access the storage location on your cloud tenant, and an optional comment.

  6. (Optional) If you want users to have read-only access to the external locations that use this storage credential, in Advanced options select Read only. For more information, see Mark a storage credential as read-only.

    note

    Because Databricks on Google Cloud provides only read-only access to S3 buckets using storage credentials, there is no need to set this option.

  7. Click Create.

  8. In the Storage credential created dialog, copy the External ID.

  9. Click Done.

  10. (Optional) Bind the storage credential to specific workspaces.

    By default, any privileged user can use the storage credential on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See (Optional) Assign the storage credential to specific workspaces.

You can also create a storage credential by using Databricks Terraform provider and databricks_storage_credential.

Step 3: Update the IAM role trust relationship policy

In AWS, modify the trust relationship policy to add your storage credential’s external ID and make it self-assuming.

  1. Return to your saved IAM role and go to the Trust Relationships tab.

  2. Edit the trust relationship policy as follows:

    Add the following ARN to the “Allow” statement. Replace <YOUR-AWS-ACCOUNT-ID> and <THIS-ROLE-NAME> with your actual account ID and IAM role values.

    "arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/<THIS-ROLE-NAME>"

    In the "sts:AssumeRole" statement, update the placeholder external ID to your storage credential’s external ID that you copied in the previous step.

    "sts:ExternalId": "<STORAGE-CREDENTIAL-EXTERNAL-ID>"

    Your policy should now look like the following, with the replacement text updated to use your storage credential’s external ID, account ID, and IAM role values:

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": [
    "arn:aws:iam::414351767826:role/unity-catalog-prod-UCGCPMainRole-1UDE9F2YBJ8MD",
    "arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/<THIS-ROLE-NAME>"
    ]
    },
    "Action": "sts:AssumeRole",
    "Condition": {
    "StringEquals": {
    "sts:ExternalId": "<STORAGE-CREDENTIAL-EXTERNAL-ID>"
    }
    }
    }
    ]
    }

Step 4: Validate the storage credential

After you have made the changes to the IAM role trust policy in Step 3: Update the IAM role trust relationship policy, verify that your IAM role is properly configured to be used as a storage credential.

note

To validate the configuration, you must be the storage credential owner, a metastore admin, or you have CREATE EXTERNAL LOCATION permissions on the storage credential.

  1. In Databricks, log in to a workspace that is linked to the metastore.

  2. Click Data icon. Catalog.

  3. On the Quick access page, click the External Data > button and go to the Credentials tab.

    As an alternative, you can click the Gear icon. gear icon at the top of the Catalog pane and select Credentials.

  4. Select the storage credential that you want to validate.

  5. Click Validate Configuration button.

  6. If any of the checks fail, return to Step 3: Update the IAM role trust relationship policy and review the IAM role’s trust policy to configure them correctly.

When the storage credential is validated, you can use it to create an external location.

Self-assuming role enforcement policy

On June 30, 2023, AWS updated its IAM role trust policy to require that IAM roles explicitly self-trust for STS:AssumeRole calls. As a result, Databricks requires that AWS IAM roles for storage credentials be self-assuming. For details, see this community blog post.

On January 20, 2025, Databricks began blocking the usage of existing storage credentials with non-self-assuming IAM roles. This prohibition can break workloads and jobs that run using non-self-assuming credentials.

To check whether an AWS IAM role for a storage credential is self-assuming, follow the instructions in Step 4: Validate the storage credential. If the Self Assume Role check fails, revisit Step 3: Update the IAM role trust relationship policy and reconfigure the IAM role’s trust policy to trust itself.

If you have multiple storage credentials in a metastore that you want to check, use the following notebook to verify the self-assuming capabilities of all storage credentials in your metastore:

Self-assuming storage credential verification notebook

Open notebook in new tab

(Optional) Assign the storage credential to specific workspaces

By default, a storage credential is accessible from all of the workspaces in the metastore. This means that if a user has been granted a privilege (such as CREATE EXTERNAL LOCATION) on that storage credential, they can exercise that privilege from any workspace attached to the metastore. If you use workspaces to isolate user data access, you might want to allow access to a storage credential only from specific workspaces. This feature is known as workspace binding or storage credential isolation. For instructions, see (Optional) Assign a storage credential to specific workspaces.

Next steps

Was this article helpful?