Configure encryption for S3 with KMS

This article covers how to configure server-side encryption with a KMS key for writing files in s3a:// paths using Unity Catalog or an instance profile (legacy configuration). To encrypt your workspace storage bucket, see Customer-managed keys for workspace storage.

Configure encryption for S3 using Unity Catalog

You can configure server-side encryption to allow external tables and volumes in Unity Catalog to access data in S3. SSE encryption is not supported with external tables shared using Delta Sharing.

Step 1: Update your KMS key policy in AWS

To protect data in S3, AWS supports server-side encryption (SSE) with Amazon S3 managed keys (SSE-S3) or AWS KMS keys (SSE-KMS). If you use an AWS S3 managed key, skip to step 2.

  1. In AWS, go to the KMS service.

  2. Click the key to which you want to add permission.

  3. In the Key Policy section, select Switch to policy view.

  4. Edit the key policy section that allows S3 to use the key, for example:

    {
        "Sid": "Allow access through S3 for all principals in the account that are authorized to use S3",
        "Effect": "Allow",
        "Principal": {
            "AWS": "*"
        },
        "Action": [
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey"
        ],
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "kms:CallerAccount": "<AWS ACCOUNT ID>",
                "kms:ViaService": "s3.<REGION>.amazonaws.com"
            }
        }
    },
    
  5. Click Save changes.

Step 2: Configure access to S3 using Unity Catalog

  1. Create a storage credential to connect to S3, using the instructions in Create a storage credential for connecting to AWS S3.

    Ensure that you create the IAM policy in the same account as the S3 bucket. If you are using SSE-KMS, include the following in the policy:

      {
        "Action": [
            "kms:Decrypt",
            "kms:Encrypt",
            "kms:GenerateDataKey*"
        ],
        "Resource": [
            "arn:aws:kms:<KMS-KEY>"
        ],
        "Effect": "Allow"
    },
    

    See Step 1: Create an IAM role.

  2. Create an external location to connect to S3, using the instructions in Create an external location to connect cloud storage to Databricks.

  3. Configure server-side encryption on your external location, using the instructions in Configure an encryption algorithm on an external location.

Configure encryption for S3 using instance profiles (legacy)

You can configure encryption for S3 with KMS using instance profiles. However, because this is a legacy pattern, Databricks recommends using Unity Catalog to configure access to S3 and volumes for direct interaction with files. See Connect to cloud object storage and services using Unity Catalog.

Step 1: Configure an instance profile

In Databricks, create an instance profile.

Step 2: Add the instance profile as a key user for the KMS key provided in the configuration

  1. In AWS, go to the KMS service.

  2. Click the key that you want to add permission to.

  3. In the Key Users section, click Add.

  4. Select the checkbox next to the IAM role.

  5. Click Add.

Step 3: Set up encryption properties

Set up global KMS encryption properties in a Spark configuration or an init script. Configure the spark.hadoop.fs.s3a.server-side-encryption.key key with your own key ARN.

Add the following to your Spark configuration:

spark.hadoop.fs.s3a.server-side-encryption.key arn:aws:kms:<region>:<aws-account-id>:key/<bbbbbbbb-ddd-ffff-aaa-bdddddddddd>
spark.hadoop.fs.s3a.server-side-encryption-algorithm SSE-KMS

You can also configure per-bucket KMS encryption. For example, you can configure each bucket individually using the following keys:

# Set up authentication and endpoint for a specific bucket
spark.hadoop.fs.s3a.bucket.<bucket-name>.aws.credentials.provider <aws-credentials-provider-class>
spark.hadoop.fs.s3a.bucket.<bucket-name>.endpoint <aws-endpoint>

# Configure a different KMS encryption key for a specific bucket
spark.hadoop.fs.s3a.bucket.<bucket-name>.server-side-encryption.key <aws-kms-encryption-key>

For more information, see Per-bucket configuration.

Configure the global encryption setting by running the following code in a notebook cell to create the init script set-kms.sh and configure a cluster to run the script.

dbutils.fs.put("/databricks/scripts/set-kms.sh", """
#!/bin/bash

cat >/databricks/driver/conf/aes-encrypt-custom-spark-conf.conf <<EOL
[driver] {
"spark.hadoop.fs.s3a.server-side-encryption.key" = "arn:aws:kms:<region>:<aws-account-id>:key/<bbbbbbbb-ddd-ffff-aaa-bdddddddddd>"
"spark.hadoop.fs.s3a.server-side-encryption-algorithm" = "SSE-KMS"
}
EOL
""", True)

Once you verify that encryption is working, configure encryption on all clusters by adding a cluster-scoped init script to cluster policies.