Manage access to external cloud services using service credentials

Preview

This feature is in Public Preview.

This article describes how to create a service credential object in Unity Catalog that lets you govern access from Databricks to external cloud services like AWS Glue or AWS Secrets Manager. A service credential in Unity Catalog encapsulates a long-term cloud credential that grants access to such services.

Service credentials are not intended for governing access to cloud storage that is used as a Unity Catalog managed storage location or external storage location. For those use cases, use a storage credential. See Manage access to cloud storage using Unity Catalog.

Note

Service credentials are the Unity Catalog alternative to instance profiles, with the advantage that access is not tied to a specific compute resource but instead to users, groups, or service principals.

To create a service credential for access to AWS services, you create an IAM role that authorizes access to the service and reference that IAM role in the service credential definition.

Before you begin

Before you create a service credential, you must meet the following requirements:

In Databricks:

  • Databricks workspace enabled for Unity Catalog.

  • CREATE SERVICE CREDENTIAL privilege on the Unity Catalog metastore attached to the workspace. Account admins and metastore admins have this privilege by default. If your workspace was enabled for Unity Catalog automatically, workspace admins also have this privilege.

In your AWS account:

  • An AWS service in the same region as the workspaces you want to access the data from.

  • The ability to create IAM roles.

Create a service credential that references an AWS IAM role

This section describes how to:

  • Create an IAM role that meets Databricks requirements for accessing an AWS service.

  • How to create a service credential securable object in Unity Catalog that can be used to access the AWS service from Databricks.

Step 1: Create an IAM role

In AWS, create an IAM role that gives access to the service that you want your users to access. This IAM role must be defined in the same account as the service.

Tip

If you have already created an IAM role that provides this access, you can skip this step and go straight to Step 2: Give Databricks the IAM role details.

  1. Create an IAM role that will allow access to the service.

    Role creation is a two-step process. In this step you create the role, adding a temporary trust relationship policy and a placeholder external ID that you then modify after creating the service credential in Databricks.

    You must modify the trust policy after you create the role because your role must be self-assuming (that is, it must be configured to trust itself). The role must therefore exist before you add the self-assumption statement. For information about self-assuming roles, see this Amazon blog article. For information about the Databricks self-assuming role enforcement policy, see Self-assuming role enforcement policy.

    To create the policy, you must use a placeholder external ID.

    1. Create the IAM role with a Custom Trust Policy.

    2. In the Custom Trust Policy field, paste the following policy JSON.

      This policy establishes a cross-account trust relationship so that Unity Catalog can assume the role to access service on behalf of Databricks users. This is specified by the ARN in the Principal section. It is a static value that references a role created by Databricks. The policy uses the Databricks AWS ARN arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL. If you are are using Databricks on AWS GovCloud use the Databricks on AWS GovCloud ARN arn:aws-us-gov:iam::044793339203:role/unity-catalog-prod-UCMasterRole-1QRFA8SGY15OJ.

      The policy sets the external ID to 0000 as a placeholder. You update this to the external ID of your service credential in a later step.

      {
        "Version": "2012-10-17",
        "Statement": [{
          "Effect": "Allow",
          "Principal": {
            "AWS": [
              "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL"
            ]
          },
          "Action": "sts:AssumeRole",
          "Condition": {
            "StringEquals": {
              "sts:ExternalId": "0000"
            }
          }
        }]
      }
      
    3. Skip the permissions policy configuration. You’ll go back to add that in a later step.

    4. Save the IAM role.

  2. Create an IAM policy in the same account as the service.

    Here are two sample policies that you can use as guidelines. One sample policy is for a service credential that connects to AWS Glue. The other references AWS Secrets Manager. The actual actions and resources that you add depend on the service that you are connecting to and the level of access you need. See the AWS IAM documentation for your service.

    The sts:AssumeRole action is the same regardless of service.

    Replace the following values:

    • <AWS-ACCOUNT-ID>: The Account ID of your AWS account (not your Databricks account).

    • <AWS-IAM-ROLE-NAME>: The name of the AWS IAM role that you created in the previous step.

    {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Action": [
                  "secretsmanager:GetResourcePolicy",
                  "secretsmanager:GetSecretValue"
              ],
              "Resource": [
                  "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes128-1a2b3c"
              ],
              "Effect": "Allow"
          },
          {
              "Action": [
                  "sts:AssumeRole"
              ],
              "Resource": [
                  "arn:aws:iam::<AWS-ACCOUNT-ID>:role/<AWS-IAM-ROLE-NAME>"
              ],
              "Effect": "Allow"
          }
        ]
    }
    

    Replace the following values:

    • <AWS-GLUE-REGION>: The region of the AWS account that holds the AWS Glue catalog. This value plus your AWS account ID constitute the AWS Glue catalog ID.

    • <AWS-ACCOUNT-ID>: The Account ID of your AWS account (not your Databricks account).

    • <AWS-IAM-ROLE-NAME>: The name of the AWS IAM role that you created in the previous step.

    Note

    If you are using AWS Lake Formation, you must also grant the IAM role access to the Lake Formation resource. See https://docs.aws.amazon.com/lake-formation/latest/dg/hybrid-access-mode.html.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "GrantCatalogAccessToGlue",
          "Effect": "Allow",
          "Action": [
            "glue:GetDatabase",
            "glue:GetDatabases",
            "glue:GetPartition",
            "glue:GetPartitions",
            "glue:GetTable",
            "glue:GetTables",
            "glue:GetUserDefinedFunction",
            "glue:GetUserDefinedFunctions",
            "glue:BatchGetPartition"
          ],
          "Resource": [
            "arn:aws:glue:<AWS-GLUE-REGION>:<AWS-ACCOUNT-ID>*",
          ]
        },
        {
          "Action": [
            "sts:AssumeRole"
          ],
          "Resource": [
            "arn:aws:iam::<AWS-ACCOUNT-ID>:role/<AWS-IAM-ROLE-NAME>"
          ],
          "Effect": "Allow"
        }
      ]
    }
    
  3. Attach the IAM policy to the IAM role.

    In the Role’s Permission tab, attach the IAM Policy you just created.

Step 2: Give Databricks the IAM role details

Permissions required: The CREATE SERVICE CREDENTIAL privilege. The metastore admin and account admin roles both include this privilege. Workspace admins in workspaces that were enabled for Unity Catalog automatically also have this privilege.

  1. In Databricks, log in to a workspace that is linked to the metastore.

  2. Click Catalog icon Catalog.

  3. On the Quick access page, click the External data > button, go to the Credentials tab, and select Create credential.

  4. Select Service Credential.

  5. Enter a Credential name, the IAM Role ARN that authorizes Unity Catalog to access the service on your cloud tenant, and an optional comment.

  6. Click Create.

  7. In the Service credential created dialog, copy the External ID.

    You can also view the external ID at any time on the service credential details page. See View a service credential.

  8. Click Done.

Step 3: Update the IAM role policy

In AWS, modify the trust relationship policy to add your service credential’s external ID and make it self-assuming.

  1. Return to your saved IAM role and go to the Trust Relationships tab.

  2. Edit the trust relationship policy as follows:

    Add the following ARN to the “Allow” statement. Replace <YOUR-AWS-ACCOUNT-ID> and <THIS-ROLE-NAME> with your actual account ID and IAM role values.

    "arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/<THIS-ROLE-NAME>"
    

    In the "sts:AssumeRole" statement, update the placeholder external ID to the service credential’s external ID that you copied in the previous step.

    "sts:ExternalId": "<SERVICE-CREDENTIAL-EXTERNAL-ID>"
    

    Your policy should now look like the following, with the replacement text updated to use your service credential’s external ID, account ID, and IAM role values:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": [
              "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL",
              "arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/<THIS-ROLE-NAME>"
            ]
          },
          "Action": "sts:AssumeRole",
          "Condition": {
            "StringEquals": {
              "sts:ExternalId": "<SERVICE-CREDENTIAL-EXTERNAL-ID>"
            }
          }
        }
      ]
    }
    

Self-assuming role enforcement policy

On June 30, 2023, AWS updated its IAM role trust policy to require that IAM roles explicitly self-trust for STS:AssumeRole calls. As a result, Databricks requires that AWS IAM roles for service credentials be self-assuming and will soon prohibit non-self-assuming service credentials. For details, see this community blog post.

Databricks will begin prohibiting the creation of service credentials with non-self-assuming AWS IAM roles on September 20, 2024. Existing service credentials with non-self-assuming IAM roles will continue to work, but you will not be able to create new service credentials using those roles.

On January 20, 2025, Databricks will begin blocking the usage of existing service credentials with non-self-assuming IAM roles. This can potentially break workloads and jobs that run using non-self-assuming credentials.

(Optional) Assign a service credential to specific workspaces

Preview

This feature is in Public Preview.

By default, a service credential is accessible from all of the workspaces in the metastore. This means that if a user has been granted a privilege on that service credential, they can exercise that privilege from any workspace attached to the metastore. If you use workspaces to isolate user data access, you may want to allow access to a service credential only from specific workspaces. This feature is known as workspace binding or service credential isolation.

A typical use case for binding a service credential to specific workspaces is the scenario in which a cloud admin configures a service credential using a production cloud account credential, and you want to ensure that Databricks users use this credential to access an external cloud service only in the production workspace.

For more information about workspace binding, see (Optional) Assign a storage credential to specific workspaces and Limit catalog access to specific workspaces.

Bind a service credential to one or more workspaces

To assign a service credential to specific workspaces, use Catalog Explorer.

Permissions required: Metastore admin or service credential owner.

Note

Metastore admins can see all service credentials in a metastore using Catalog Explorer—and service credential owners can see all service credentials that they own in a metastore—regardless of whether the service credential is assigned to the current workspace. Service credentials that are not assigned to the workspace appear grayed out.

  1. Log in to a workspace that is linked to the metastore.

  2. In the sidebar, click Catalog icon Catalog.

  3. On the Quick access page, click the External data > button and go to the Credentials tab.

  4. Select the service credential and go to the Workspaces tab.

  5. On the Workspaces tab, clear the All workspaces have access checkbox.

    If your service credential is already bound to one or more workspaces, this checkbox is already cleared.

  6. Click Assign to workspaces and enter or find the workspaces you want to assign.

To revoke access, go to the Workspaces tab, select the workspace, and click Revoke. To allow access from all workspaces, select the All workspaces have access checkbox.

Next steps

Limitations

The following limitations apply:

  • Databricks Runtime 15.4 LTS includes Python support only.

  • SQL warehouses are not supported.

  • Some audit events for actions performed on service credentials will not appear in the system.access.audit table. Audit information about who created, deleted, updated, read, listed, or used a service credential will be available. See Audit log system table reference.

  • During the service credentials preview, INFORMATION_SCHEMA.STORAGE_CREDENTIALS (deprecated) displays both storage credentials and service credentials, and INFORMATION_SCHEMA.STORAGE_CREDENTIAL_PRIVILEGES (deprecated) displays privileges that apply both to storage credentials and service credentials. This is incorrect preview behavior that will be corrected, and you should not depend on it to continue. You should instead use INFORMATION_SCHEMA.CREDENTIALS and INFORMATION_SCHEMA.CREDENTIAL_PRIVILEGES for both storage and service credentials.