Secure access to S3 buckets across accounts using instance profiles with an AssumeRole policy

In AWS you can set up cross-account access, so the computing in one account can access a bucket in another account. One way to grant access, described in Secure access to S3 buckets using instance profiles, is to grant an account direct access to a bucket in another account. Another way to grant access to a bucket is to allow an account to assume a role in another account.

Consider AWS Account A with account ID <deployment-acct-id> and AWS Account B with account ID <bucket-owner-acct-id>. Account A is used when signing up with Databricks: EC2 services and the DBFS root bucket are managed by this account. Account B has a bucket <s3-bucket-name>.

This article provides the steps to configure Account A to use the AWS AssumeRole action to access S3 files in <s3-bucket-name> as a role in Account B. To enable this access you perform configuration in Account A and Account B and in the Databricks Admin Console. You also must either configure a Databricks cluster or add a configuration to a notebook that accesses the bucket.

Requirements

  • AWS administrator access to IAM roles and policies in the AWS account of the Databricks deployment and the AWS account of the S3 bucket.
  • Target S3 bucket.
  • If you intend to enable encryption for the S3 bucket, you must add the instance profile as a Key User for the KMS key provided in the configuration. See Configure KMS encryption.

Step 1: In Account A, create role MyRoleA and attach policies

  1. Create a role named MyRoleA in Account A. The Instance Profile ARN is arn:aws:iam::<deployment-acct-id>:instance-profile/MyRoleA.

  2. Create a policy that says that a role in Account A can assume MyRoleB in Account B. Attach it to MyRoleA. Click Inline policy and paste in the policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "Stmt1487884001000",
          "Effect": "Allow",
          "Action": [
            "sts:AssumeRole"
          ],
          "Resource": [
            "arn:aws:iam::<bucket-owner-acct-id>:role/MyRoleB"
          ]
        }
      ]
    }
    
  3. Update the policy for the Account A role used to create clusters, adding the iam:PassRole action to MyRoleA:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "Stmt1403287045000",
          "Effect": "Allow",
          "Action": [
            "ec2:AssociateDhcpOptions",
            "ec2:AssociateIamInstanceProfile",
            "ec2:AssociateRouteTable",
            "ec2:AttachInternetGateway",
            "ec2:AttachVolume",
            "ec2:AuthorizeSecurityGroupEgress",
            "ec2:AuthorizeSecurityGroupIngress",
            "ec2:CancelSpotInstanceRequests",
            "ec2:CreateDhcpOptions",
            "ec2:CreateInternetGateway",
            "ec2:CreateKeyPair",
            "ec2:CreatePlacementGroup",
            "ec2:CreateRoute",
            "ec2:CreateSecurityGroup",
            "ec2:CreateSubnet",
            "ec2:CreateTags",
            "ec2:CreateVolume",
            "ec2:CreateVpc",
            "ec2:CreateVpcPeeringConnection",
            "ec2:DeleteInternetGateway",
            "ec2:DeleteKeyPair",
            "ec2:DeletePlacementGroup",
            "ec2:DeleteRoute",
            "ec2:DeleteRouteTable",
            "ec2:DeleteSecurityGroup",
            "ec2:DeleteSubnet",
            "ec2:DeleteTags",
            "ec2:DeleteVolume",
            "ec2:DeleteVpc",
            "ec2:DescribeAvailabilityZones",
            "ec2:DescribeIamInstanceProfileAssociations",
            "ec2:DescribeInstanceStatus",
            "ec2:DescribeInstances",
            "ec2:DescribePlacementGroups",
            "ec2:DescribePrefixLists",
            "ec2:DescribeReservedInstancesOfferings",
            "ec2:DescribeRouteTables",
            "ec2:DescribeSecurityGroups",
            "ec2:DescribeSpotInstanceRequests",
            "ec2:DescribeSpotPriceHistory",
            "ec2:DescribeSubnets",
            "ec2:DescribeVolumes",
            "ec2:DescribeVpcs",
            "ec2:DetachInternetGateway",
            "ec2:DisassociateIamInstanceProfile",
            "ec2:ModifyVpcAttribute",
            "ec2:ReplaceIamInstanceProfileAssociation",
            "ec2:RequestSpotInstances",
            "ec2:RevokeSecurityGroupEgress",
            "ec2:RevokeSecurityGroupIngress",
            "ec2:RunInstances",
            "ec2:TerminateInstances"
          ],
          "Resource": [
              "*"
          ]
        },
        {
          "Effect": "Allow",
          "Action": "iam:PassRole",
          "Resource": [
            "arn:aws:iam::<deployment-acct-id>:role/MyRoleA"
          ]
        }
      ]
    }
    

Step 2: In Account B, create role MyRoleB and attach policies

  1. Create a role named MyRoleB. The Role ARN is arn:aws:iam::<bucket-owner-acct-id>:role/MyRoleB.

  2. Edit the trust relationship of role MyRoleB to allow a role MyRoleA in Account A to assume a role in Account B. Select IAM > Roles > MyRoleB > Trust relationships > Edit trust relationship and enter:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": [
              "arn:aws:iam::<deployment-acct-id>:role/MyRoleA"
            ]
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    
  3. Create a bucket policy for the bucket <s3-bucket-name>. Select S3 > <s3-bucket-name> > Permissions > Bucket Policy and enter:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "s3:GetBucketLocation",
            "s3:ListBucket"
          ],
          "Resource": [
              "arn:aws:s3:::<s3-bucket-name>"
          ]
        },
        {
          "Effect": "Allow",
          "Action": [
            "s3:PutObject",
            "s3:PutObjectAcl",
            "s3:GetObject",
            "s3:DeleteObject"
          ],
          "Resource": [
              "arn:aws:s3:::<s3-bucket-name>/*"
          ]
        }
      ]
    }
    
  4. Add the role (Principal) MyRoleB to the bucket policy.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": [
                "arn:aws:iam::<bucket-owner-acct-id>:role/MyRoleB"
            ]
          },
          "Action": [
            "s3:GetBucketLocation",
            "s3:ListBucket"
          ],
          "Resource": "arn:aws:s3:::<s3-bucket-name>"
        },
        {
          "Effect": "Allow",
          "Principal": {
              "AWS": [
                  "arn:aws:iam::<bucket-owner-acct-id>:role/MyRoleB"
              ]
          },
          "Action": [
            "s3:PutObject",
            "s3:PutObjectAcl",
            "s3:GetObject",
            "s3:DeleteObject"
          ],
          "Resource": "arn:aws:s3:::<s3-bucket-name>/*"
        }
      ]
    }
    

Tip

If you are prompted with a Principal error, make sure that you modified only the Trust relationship policy.

Step 3: Add MyRoleA to the Databricks workspace

In the Databricks Admin Console, add the instance profile MyRoleA to Databricks using the MyRoleA instance profile ARN arn:aws:iam::<deployment-acct-id>:instance-profile/MyRoleA from step 1.

Step 4: Configure cluster with MyRoleA

  1. Select or create a cluster.

  2. Open the Advanced Options section.

  3. On the Instances tab, select the instance profile MyRoleA.

  4. On the Spark tab, optionally set the assumeRole credential type and assume role ARN MyRoleB:

    spark.hadoop.fs.s3a.credentialsType AssumeRole
    spark.hadoop.fs.s3a.stsAssumeRole.arn arn:aws:iam::<bucket-owner-acct-id>:role/MyRoleB
    spark.hadoop.fs.s3a.canned.acl BucketOwnerFullControl
    spark.hadoop.fs.s3a.acl.default BucketOwnerFullControl
    
  5. Start the cluster.

  6. Attach a notebook to the cluster.

  7. Do one of the following:

    • Optionally mount <s3-bucket-name> on DBFS with extra configurations:

      dbutils.fs.mount("s3a://<s3-bucket-name>", "/mnt/<s3-bucket-name>",
        extraConfigs = Map(
          "fs.s3a.credentialsType" -> "AssumeRole",
          "fs.s3a.stsAssumeRole.arn" -> "arn:aws:iam::<bucket-owner-acct-id>:role/MyRoleB",
          "spark.hadoop.fs.s3a.canned.acl" -> "BucketOwnerFullControl",
          "spark.hadoop.fs.s3a.acl.default" -> "BucketOwnerFullControl"
        )
      )
      

      Note

      This is the recommended option.

    • If you did not set the assumeRole credential type and assume role ARN in the Spark configuration of the cluster or mount the S3 bucket, you can do it in the first command in a notebook:

      sc.hadoopConfiguration.set("fs.s3a.credentialsType", "AssumeRole")
      sc.hadoopConfiguration.set("fs.s3a.stsAssumeRole.arn", "arn:aws:iam::<bucket-owner-acct-id>:role/MyRoleB")
      sc.hadoopConfiguration.set("fs.s3a.canned.acl", "BucketOwnerFullControl")
      sc.hadoopConfiguration.set("fs.s3a.acl.default", "BucketOwnerFullControl")
      
  8. Verify that you can access <s3-bucket-name>, using the following command:

    dbutils.fs.ls("/mnt/<s3-bucket-name>")
    

    or

    dbutils.fs.ls("s3a://<s3-bucket-name>/")