Configure encryption for S3 with KMS

This article covers how to configure server-side encryption with a KMS key for writing files in s3a:// paths. To encrypt your workspace’s root S3 bucket, see Customer-managed keys for workspace storage.

Step 1: Configure an instance profile

In Databricks, create an instance profile.

Step 2: Add the instance profile as a key user for the KMS key provided in the configuration

  1. In AWS, go to the KMS service.

  2. Click the key that you want to add permission to.

  3. In the Key Users section, click Add.

  4. Select the checkbox next to the IAM role.

  5. Click Add.

Step 3: Set up encryption properties

Set up global KMS encryption properties in a Spark configuration setting or using an init script. Configure the spark.hadoop.fs.s3a.server-side-encryption.key key with your own key ARN.

Spark configuration

spark.hadoop.fs.s3a.server-side-encryption.key arn:aws:kms:<region>:<aws-account-id>:key/<bbbbbbbb-ddd-ffff-aaa-bdddddddddd>
spark.hadoop.fs.s3a.server-side-encryption-algorithm SSE-KMS

You can also configure per-bucket KMS encryption. For example, you can configure each bucket individually using the following keys:

# Set up authentication and endpoint for a specific bucket
spark.hadoop.fs.s3a.bucket.<bucket-name>.aws.credentials.provider <aws-credentials-provider-class>
spark.hadoop.fs.s3a.bucket.<bucket-name>.endpoint <aws-endpoint>

# Configure a different KMS encryption key for a specific bucket
spark.hadoop.fs.s3a.bucket.<bucket-name>.server-side-encryption.key <aws-kms-encryption-key>

For more information, see Per-bucket configuration.

Init script

Configure the global encryption setting by running the following code in a notebook cell to create the init script set-kms.sh and configure a cluster to run the script.

dbutils.fs.put("/databricks/scripts/set-kms.sh", """
#!/bin/bash

cat >/databricks/driver/conf/aes-encrypt-custom-spark-conf.conf <<EOL
[driver] {
"spark.hadoop.fs.s3a.server-side-encryption.key" = "arn:aws:kms:<region>:<aws-account-id>:key/<bbbbbbbb-ddd-ffff-aaa-bdddddddddd>"
"spark.hadoop.fs.s3a.server-side-encryption-algorithm" = "SSE-KMS"
}
EOL
""", True)

Once you verify that encryption is working, configure encryption on all clusters adding a cluster-scoped init script to cluster policies.