Customer-managed keys for managed services
To use customer-managed keys (CMK) for managed services, the workspace must be on the E2 version of the Databricks platform or on a custom plan that has been enabled by Databricks for this feature. All new Databricks accounts and most existing accounts are now E2. If you are unsure which account type you have, contact your Databricks representative.
For a list of regions that support customer-managed keys, see Databricks clouds and regions.
For support with serverless compute resources, see Serverless compute and customer-managed keys.
Customer-managed key use cases
For additional control of your data, you can add your own key to protect and control access to some types of data. Databricks has two customer-managed key features. To compare them, see Customer-managed keys for encryption.
Managed services data in the Databricks control plane is encrypted at rest. You can add a customer-managed key for managed services to help protect and control access to the following types of encrypted data:
Notebook source in the Databricks control plane.
Notebook results for notebooks run interactively (not as jobs) that are stored in the control plane. By default, larger results are also stored in your workspace root bucket. You can configure Databricks to store all interactive notebook results in your cloud account.
Secrets stored by the secret manager APIs.
Databricks SQL queries and query history.
Personal access tokens (PAT) or other credentials used to set up Git integration with Databricks Repos.
After you add a customer-managed key encryption for a workspace, Databricks uses your key to control access to the key that encrypts future write operations to your workspace’s managed services data. Existing data is not re-encrypted. The data encryption key is cached in memory for several read and write operations and evicted from memory at a regular interval. New requests for that data require another request to your cloud service’s key management system. If you delete or revoke your key, reading or writing to the protected data fails at the end of the cache time interval.
You can rotate (update) the customer-managed key at a later time. See Add or update a customer-managed key on a running workspace.
You can optionally share a Databricks key configuration object (which references your key) between the two different encryption use cases: this feature (managed services) and Customer-managed keys for workspace storage. Note that in both cases you can add the key and its Databricks key configuration to your Databricks workspace during workspace creation or add it later, but only managed services supports rotating (updating) the key later.
Note
This feature does not encrypt data stored outside of the control plane. Separately, you can encrypt data in your root S3 bucket and cluster EBS volumes in the Classic data plane.
Add a customer-managed key to a new workspace
To add a customer-managed key for managed services, you must add the key when you create a workspace using the Account API. You can also use the Databricks Terraform provider and databricks_mws_customer_managed_keys.
Create a key
To configure your customer-managed key:
Create or select a symmetric key in AWS KMS, following the instructions in Creating symmetric CMKs or Viewing keys.
Copy these values. You will use them when you create the workspace:
Key ARN — Get the ARN from the console or the API (the
Arn
field in the JSON response).Key alias — An alias specifies a display name for the customer-managed key in AWS KMS. Use an alias to identify a customer-managed key in cryptographic operations. For more information, see the AWS documentation: AWS::KMS::Alias and Working with aliases.
On the Key policy tab, switch to the policy view. Edit the key policy so that Databricks can use the key to perform encryption and decryption operations. Add the following to the key policy
"Statement"
:{ "Sid": "Allow Databricks to use KMS key for managed services in the control plane", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::414351767826:root" }, "Action": [ "kms:Encrypt", "kms:Decrypt" ], "Resource": "*", "Condition": { "StringEquals": { "aws:PrincipalTag/DatabricksAccountId": ["<databricks-account-id>(s)"] } } }
Note
To retrieve your Databricks account ID, follow Locate your account ID.
For more information, see the AWS article Editing keys.
Register a key for a new workspace
To register the key for a new workspace, follow the instructions in Create a workspace using the Account API, specifically Step 5: Configure customer-managed keys (optional).
Those instructions show how to optionally share this key for encrypting workspace storage.
Add or update a customer-managed key on a running workspace
You can add a customer-managed key or rotate (update) an existing customer-managed key for managed services on a running workspace by making a PATCH
request to update the workspace with a new key configuration ID.
If you add a customer-managed key for managed services to a running workspace, Databricks uses it for future write operations to managed services data. Export and re-import a notebook to update its storage. Existing data is not re-encrypted.
There are two use cases for customer-managed encryption keys:
Encrypt managed services, which includes notebook and secret data in the control plane.
Encrypt workspace storage, which includes the workspace’s root S3 bucket and optionally cluster EBS volumes.
You can choose to configure neither, one, or both of these. If you choose to implement encryption for both uses cases, you can optionally share a key and optionally even the same configuration object for these uses cases.
There are important differences between these two use cases in when you can add the keys:
For a customer-managed key for managed services, you can configure it during workspace creation, add the key to a running workspace, or rotate (update) the key later.
For a customer-managed key for storage, you can configure it during workspace creation or add the key to a running workspace, but you cannot rotate (update) the key later.
You can share a customer-managed key or its key configuration object across workspaces. When creating a new workspace, a key configuration can represent both encryption use cases by setting its use_cases
field to include both enumeration values.
To implement one encryption use case or both encryption use cases with the same key, perform the following procedure exactly once. To add encryption for both encryption use cases with different keys, perform the procedure two times, once for each use case.
Create the AWS KMS key. Follow the instructions in either of the following sections, which differ only in the human-readable description field (
sid
) in the policy to identify the use case. Create the key for managed services or workspace storage. To share the key and configuration for both use cases, update thesid
field accordingly.To register your KMS key with Databricks, call the create customer-managed key configuration API (
POST /accounts/<account-id>/customer-managed-keys
).Pass the following parameters:
use_cases
— An array that specifies the uses cases for which to use the key, specify one or both of the following:MANAGED_SERVICES
: This key encrypts managed services, which includes notebook, secret, and Databricks SQL query data in the control plane.STORAGE
: This key encrypts workspace storage, which includes the workspace’s DBFS root and cluster EBS volumes.
aws_key_info
: A JSON object with the following properties:key_arn
: AWS KMS key ARN. Note that Databricks infers the AWS region from the key ARN.key_alias
: (Optional) AWS KMS key alias.reuse_key_for_cluster_volumes
: (Optional) Used only if theuse_cases
array containsSTORAGE
, this specifies whether to also use the key to encrypt cluster EBS volumes. The default value istrue
, which means Databricks also uses the key for cluster volumes. If you set this tofalse
, Databricks does not encrypt the EBS volumes with your specified key. In that case, your Databricks EBS volumes are encrypted either with default AWS SSE encryption or if you enabled AWS account-level EBS encryption by default, AWS enforces account-level EBS encryption using a separate key that you provided to them. Note that ifreuse_key_for_cluster_volumes
istrue
and you revoke the permission for the key, it does not affect running clusters but affects new and restarted clusters.
Example request:
curl -X POST -n \ 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/customer-managed-keys' \ -d '{ "use_cases": ["MANAGED_SERVICES", "STORAGE"], "aws_key_info": { "key_arn": "arn:aws:kms:<region>:<aws-account-id>:key/<key-id>", "key_alias": "my-example-key", "reuse_key_for_cluster_volumes": true } }'
Example response:
{ "use_cases": ["MANAGED_SERVICES", "STORAGE"], "customer_managed_key_id": "<databricks-key-config-id>", "creation_time": 1586447506984, "account_id": "<databricks-account-id>", "aws_key_info": { "key_arn": "arn:aws:kms:<region>:<aws-account-id>:key/<key-id>", "key_alias": "my-example-key", "reuse_key_for_cluster_volumes": true, "key_region": "<region>" } }
From the response JSON, copy the
customer_managed_key_id
. You use that ID in the next step to set your workspace configuration object’s propertymanaged_services_customer_managed_key_id
,storage_customer_managed_key_id
, or both, depending on which encryption use cases this object represents.Note
If you are planning to add the key for both CMK use cases, note that workspace storage encryption cannot be rotated (updated) after you have already set a key. Do not attempt to rotate an existing key for workspace storage.
Terminate all running clusters, pools, and SQL warehouses.
Update a workspace with your key configuration using the Account API. Use the Databricks Account API to update your workspace.
Important
After you run the key rotation command, you must keep your old KMS key available to Databricks for 24 hours.
Call the Account API operation to update a workspace (
PATCH /accounts/{account_id}/workspaces/{workspace_id}
).To add the key for managed services, the only required argument is the
managed_services_customer_managed_key_id
. Set it to the JSON response when you registered your key configuration.For example:
curl -X PATCH -n \ 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<workspace-id>' \ -d '{ "managed_services_customer_managed_key_id": "<databricks-key-config-id>", }'
Note
If you plan to add the key for both use cases, also set the
storage_customer_managed_key_id
variable to the same value.If you are adding keys for both managed services and storage use cases, wait at least 20 mins after your API update before proceeding. During this time, you must not start any clusters or use the DBFS API. If you are only adding the key for managed services use case only, you can omit the 20 minute wait.
Restart any clusters, pools, and SQL warehouses that you terminated in a previous step.