Configure encryption for S3 with KMS
This article covers how to configure server-side encryption with a KMS key for writing files in s3a://
paths. To encrypt your workspace storage bucket, see Customer-managed keys for workspace storage.
To configure server-side encryption to allow external tables and volumes in Unity Catalog to access data in S3, see Configure an encryption algorithm on an external location.
Step 1: Configure an instance profile
In Databricks, create an instance profile.
Step 2: Add the instance profile as a key user for the KMS key provided in the configuration
In AWS, go to the KMS service.
Click the key that you want to add permission to.
In the Key Users section, click Add.
Select the checkbox next to the IAM role.
Click Add.
Step 3: Set up encryption properties
Set up global KMS encryption properties in a Spark configuration setting or using an init script.
Configure the spark.hadoop.fs.s3a.server-side-encryption.key
key with your own key ARN.
Spark configuration
spark.hadoop.fs.s3a.server-side-encryption.key arn:aws:kms:<region>:<aws-account-id>:key/<bbbbbbbb-ddd-ffff-aaa-bdddddddddd>
spark.hadoop.fs.s3a.server-side-encryption-algorithm SSE-KMS
You can also configure per-bucket KMS encryption. For example, you can configure each bucket individually using the following keys:
# Set up authentication and endpoint for a specific bucket
spark.hadoop.fs.s3a.bucket.<bucket-name>.aws.credentials.provider <aws-credentials-provider-class>
spark.hadoop.fs.s3a.bucket.<bucket-name>.endpoint <aws-endpoint>
# Configure a different KMS encryption key for a specific bucket
spark.hadoop.fs.s3a.bucket.<bucket-name>.server-side-encryption.key <aws-kms-encryption-key>
For more information, see Per-bucket configuration.
Init script
Configure the global encryption setting by running the following code in a notebook cell to create the init script set-kms.sh
and configure a cluster to run the script.
dbutils.fs.put("/databricks/scripts/set-kms.sh", """
#!/bin/bash
cat >/databricks/driver/conf/aes-encrypt-custom-spark-conf.conf <<EOL
[driver] {
"spark.hadoop.fs.s3a.server-side-encryption.key" = "arn:aws:kms:<region>:<aws-account-id>:key/<bbbbbbbb-ddd-ffff-aaa-bdddddddddd>"
"spark.hadoop.fs.s3a.server-side-encryption-algorithm" = "SSE-KMS"
}
EOL
""", True)
Once you verify that encryption is working, configure encryption on all clusters adding a cluster-scoped init script to cluster policies.