Secure access to S3 buckets using IAM credential passthrough with SAML 2.0 federation

Preview

This feature is in Public Preview.

AWS supports SAML 2.0 identity federation to allow for single-sign on to AWS Management Console and AWS APIs. Databricks workspaces that are configured with single sign-on can use AWS IAM federation to maintain the mapping of users to IAM roles within their identity provider (IdP) rather than within Databricks using SCIM. This allows you to centralize data access within your IdP and have those entitlements pass directly to Databricks clusters.

The following diagram illustrates the federation workflow:

Federation workflow
  1. Configure a trust relationship between your IdP and AWS accounts in order for the IdP to control which roles users can assume.
  2. Users login to Databricks via SAML SSO, the entitlement to the roles are passed by the IdP.
  3. Databricks calls the AWS Security Token Service (STS) and assumes the roles for the user by passing the SAML response and getting temporary tokens.
  4. When a user accesses S3 from a Databricks cluster, Databricks runtime uses the temporary tokens for the user to perform the access automatically and securely.

Note

Federation for IAM credential passthrough always maps roles to users in SAML when the Allow IAM role entitlement auto sync is enabled. It will overwrite any previous roles set via the SCIM API.

Requirements

  • Premium plan (or, for customers who subscribed to Databricks before March 3, 2020, the Operational Security package).
  • SAML single sign-on configured in your Databricks workspace.
  • AWS administrator access to:
    • IAM roles and policies in the AWS account of the Databricks deployment.
    • AWS account of the S3 bucket.
  • Identity provider (IdP) administrator to configure your IdP to pass AWS roles to Databricks.
  • A Databricks admin to include AWS roles in the SAML assertion.

Step 1: Get the Databricks SAML URL

  1. Go to the Admin Console.

  2. Click the Single Sign-On tab.

  3. Copy the Databricks SAML URL.

    SAML URL

Step 2: Download identity provider metadata

Note

The steps within the identity provider console vary slightly for each identity provider. See Integrating Third-Party SAML Solution Providers with AWS for examples with your identity provider.

  1. In your identity provider admin console, find your Databricks application for single sign-on.

  2. Download the SAML metadata.

    Add attribute

Step 3: Configure the identity provider

  1. In the AWS console, go to the IAM service.
  2. Click the Identity Providers tab in the sidebar.
  3. Click Create Provider.
    1. In Provider Type, select SAML.
    2. In Provider Name, enter a name.
    3. In Metadata Document, click Choose File and navigate to the file containing the metadata document you downloaded above.
    4. Click Next Step and then Create.

Step 4: Configure the IAM role for federation

Note

Only roles used for data access should be used for federation with Databricks. We do not recommend allowing roles normally used for AWS console access as they may have more privileges than necessary.

  1. In the AWS console, go to the IAM service.

  2. Click the Roles tab in the sidebar.

  3. Click Create role.

    1. Under Select type of trusted entity, select SAML 2.0 federation.
    2. In SAML provider, select the name created in Step 3.
    3. Select Allow programmatic access only.
    4. In Attribute, select SAML:aud.
    5. In Value, paste the Databricks SAML URL you copied in Step 1.
    6. Click Next: Permissions, Next: Tags, and Next: Review.
    7. In the Role Name field, type a role name.
    8. Click Create role. The list of roles displays.
  4. Add an inline policy to the role. This policy grants access to the S3 bucket.

    1. In the Permissions tab, click Inline policy.

    2. Click the JSON tab. Copy this policy and set <s3-bucket-name> to the name of your bucket.

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
            "Action": [
              "s3:ListBucket"
            ],
           "Resource": [
              "arn:aws:s3:::<s3-bucket-name>"
            ]
          },
          {
            "Effect": "Allow",
            "Action": [
              "s3:PutObject",
              "s3:GetObject",
              "s3:DeleteObject",
              "s3:PutObjectAcl"
            ],
            "Resource": [
               "arn:aws:s3:::<s3-bucket-name>/*"
            ]
          }
        ]
      }
      
    3. Click Review policy.

    4. In the Name field, type a policy name.

    5. Click Create policy.

  5. In the Trusted Relationships tab, you should be able to see something similar to:

    Trust relationship
  6. Click the Edit trust relationship button. The IAM resulting trust policy document should be similar to the following:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Federated": "arn:aws:iam::<accountID>:saml-provider/<IdP-name>"
          },
          "Action": "sts:AssumeRoleWithSAML",
          "Condition": {
            "StringEquals": {
              "SAML:aud": "https://xxxxxx.cloud.databricks.com/saml/consume"
            }
          }
        }
      ]
    }
    

Step 5: Configure the identity provider to pass attributes to Databricks

The following attributes must be passed to Databricks in the SAML response via SSO in order for Databricks to pass roles to clusters:

  • https://aws.amazon.com/SAML/Attributes/Role
  • https://aws.amazon.com/SAML/Attributes/RoleSessionName

These attributes are the list of role ARNs and the username matching the single sign-on login. Role mappings are refreshed when a user logs in to the Databricks workspace.

Note

If user entitlement to the IAM roles is based on AD/LDAP group membership, you must configure that group to role mapping per your IdP.

Each identity provider differs in how you add attributes to pass through SAML. The following section shows one example with Okta. See Integrating Third-Party SAML Solution Providers with AWS for examples with your identity provider.

Okta example

  1. In the Okta Admin Console under Applications, select your Single Sign-On to Databricks application.

  2. Click Edit under SAML Settings and click Next to the Configure SAML tab.

  3. In Attribute Statements add the following attributes:

    1. Name: https://aws.amazon.com/SAML/Attributes/RoleSessionName, Name format: URI Reference, Value: user.login
  4. To manage the roles easily using groups, create groups corresponding to your IAM roles, for example GroupA and GroupB, and add the users to those groups.

  5. You can use Okta Expressions to match groups and roles in the following way:

    1. Name: https://aws.amazon.com/SAML/Attributes/Role, Name format: URI Reference, Value:

      Arrays.flatten(isMemberOfGroupName("GroupA") ? "arn:aws:iam::xxx:role/role-a,arn:aws:iam::xxx:saml-provider/okta-databricks" : {}, isMemberOfGroupName("GroupB") ? "arn:aws:iam::xxx:role/role-b,arn:aws:iam::xxx:saml-provider/okta-databricks" : {})
      

      It should look like:

      Okta expression

      Only users in a certain group would have permission to use the corresponding IAM role.

  6. Use Manage People to add users to the group.

  7. Use Manage Apps to assign the group to the SSO application to allow users to log in to Databricks.

To add additional roles follow the steps above, mapping an Okta group to a federated role. To have roles in different AWS accounts, add the SSO application as a new IAM identity provider to each additional AWS account that will have federated roles for Databricks.

Step 6: Optionally configure Databricks to synchronize role mappings from SAML to SCIM

Do this step if you want to use IAM credential passthrough for jobs or JDBC. Otherwise, you must set IAM role mappings using the SCIM API.

  1. Go to the Admin Console.

  2. Click the Single Sign-On tab.

  3. Select Allow IAM role entitlement auto sync.

    SSO tab

Best practices

For the best experience we recommend setting the IAM role maximum session duration between 4 to 8 hours. This is to avoid users having to repeatedly re-authenticate themselves in order to fetch new tokens or long queries failing due to expired tokens. To set the duration:

  1. In the AWS console, click the the meta IAM role you configured in Configure a meta IAM role.

  2. In the Maximum CLI/API session duration property, click Edit.

    Set session duration
  3. Select the duration and click Save changes.

Use IAM credential passthrough with federation

Follow the instructions in Launch an IAM credential passthrough cluster and do not add an instance profile. To use IAM passthrough with federation for jobs or JDBC connections, follow the instructions in Set up a meta instance profile.

Security

It is safe to share high concurrency IAM credential passthrough clusters with other users. You will be isolated from each other and will not be able to read or use each other’s credentials.