Customer-managed keys for managed services

To use customer-managed keys (CMK) for managed services, the workspace must be on the E2 version of the Databricks platform or on a custom plan that has been enabled by Databricks for this feature. All new Databricks accounts and most existing accounts are now E2. If you are unsure which account type you have, contact your Databricks representative.

Workspace data plane VPCs can be in AWS regions ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-west-1, eu-west-2, eu-central-1, us-east-1, us-east-2, us-west-1, and us-west-2. However, you cannot use a VPC in us-west-1 if you want to use customer-managed keys for encryption.

For support with serverless compute resources, see Serverless compute and customer-managed keys.

Customer-managed key use cases

For additional control of your data, you can add your own key to protect and control access to some types of data. Databricks has two customer-managed key features. To compare them, see Customer-managed keys for encryption.

Managed services data in the Databricks control plane is encrypted at rest. You can add a customer-managed key for managed services to help protect and control access to the following types of encrypted data:

After you add a customer-managed key encryption for a workspace, Databricks uses your key to control access to the key that encrypts future write operations to your workspace’s managed services data. Existing data is not re-encrypted. The data encryption key is cached in memory for several read and write operations and evicted from memory at a regular interval. New requests for that data require another request to your cloud service’s key management system. If you delete or revoke your key, reading or writing to the protected data fails at the end of the cache time interval.

You can rotate (update) the customer-managed key at a later time. See Add or update a customer-managed key on a running workspace.

You can optionally share a Databricks key configuration object (which references your key) between the two different encryption use cases: this feature (managed services) and Customer-managed keys for workspace storage. Note that in both cases you can add the key and its Databricks key configuration to your Databricks workspace during workspace creation or add it later, but only managed services supports rotating (updating) the key later.

Note

This feature does not encrypt data stored outside of the control plane. Separately, you can encrypt data in your root S3 bucket and cluster EBS volumes in the Classic data plane.

Add a customer-managed key to a new workspace

To add a customer-managed key for managed services, you must add the key when you create a workspace using the Account API. You can also use the Databricks Terraform provider and databricks_mws_customer_managed_keys.

Create a key

To configure your customer-managed key:

  1. Create or select a symmetric key in AWS KMS, following the instructions in Creating symmetric CMKs or Viewing keys.

  2. Copy these values. You will use them when you create the workspace:

    • Key ARN — Get the ARN from the console or the API (the Arn field in the JSON response).

    • Key alias — An alias specifies a display name for the customer-managed key in AWS KMS. Use an alias to identify a customer-managed key in cryptographic operations. For more information, see the AWS documentation: AWS::KMS::Alias and Working with aliases.

  3. On the Key policy tab, switch to the policy view. Edit the key policy so that Databricks can use the key to perform encryption and decryption operations. Add the following to the key policy "Statement":

    {
      "Sid": "Allow Databricks to use KMS key for managed services in the control plane",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::414351767826:root"
      },
      "Action": [
        "kms:Encrypt",
        "kms:Decrypt"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/DatabricksAccountId": ["<databricks-account-id>(s)"]
        }
      }
    }
    

    Note

    To retrieve your Databricks account ID, go to the account console then click the user icon at the bottom of the sidebar. There you can see and copy the ID.

    For more information, see the AWS article Editing keys.

Register a key for a new workspace

To register the key for a new workspace, follow the instructions in Create a new workspace using the Account API, specifically Step 5: Configure customer-managed keys (optional).

Those instructions show how to optionally share this key for encrypting workspace storage.

Add or update a customer-managed key on a running workspace

You can add a customer-managed key or rotate (update) an existing customer-managed key for managed services on a running workspace by making a PATCH request to update the workspace with a new key configuration ID.

If you add a customer-managed key for managed services to a running workspace, Databricks uses it for future write operations to managed services data. Export and re-import a notebook to update its storage. Existing data is not re-encrypted.

There are two use cases for customer-managed encryption keys:

You can choose to configure neither, one, or both of these. If you choose to implement encryption for both uses cases, you can optionally share a key and optionally even the same configuration object for these uses cases.

There are important differences between these two use cases in when you can add the keys:

  • For a customer-managed key for managed services, you can configure it during workspace creation, add the key to a running workspace, or rotate (update) the key later.

  • For a customer-managed key for storage, you can configure it during workspace creation or add the key to a running workspace, but you cannot rotate (update) the key later.

You can share a customer-managed key or its key configuration object across workspaces. When creating a new workspace, a key configuration can represent both encryption use cases by setting its use_cases field to include both enumeration values.

To implement one encryption use case or both encryption use cases with the same key, perform the following procedure exactly once. To add encryption for both encryption use cases with different keys, perform the procedure two times, once for each use case.

  1. Create the AWS KMS key. Follow the instructions in either of the following sections, which differ only in the human-readable description field (sid) in the policy to identify the use case. Create the key for managed services or workspace storage. To share the key and configuration for both use cases, update the sid field accordingly.

  2. To register your KMS key with Databricks, call the create customer-managed key configuration API (POST /accounts/<account-id>/customer-managed-keys).

    Pass the following parameters:

    • use_cases — An array that specifies the uses cases for which to use the key, specify one or both of the following:

      • MANAGED_SERVICES: This key encrypts managed services, which includes notebook, secret, and Databricks SQL query data in the control plane.

        • STORAGE: This key encrypts workspace storage, which includes the workspace’s DBFS root and cluster EBS volumes.

      • aws_key_info: A JSON object with the following properties:

        • key_arn: AWS KMS key ARN. Note that Databricks infers the AWS region from the key ARN.

        • key_alias: (Optional) AWS KMS key alias.

        • reuse_key_for_cluster_volumes: (Optional) Used only if the use_cases array contains STORAGE, this specifies whether to also use the key to encrypt cluster EBS volumes. The default value is true, which means Databricks also uses the key for cluster volumes. If you set this to false, Databricks does not encrypt the EBS volumes with your specified key. In that case, your Databricks EBS volumes are encrypted either with default AWS SSE encryption or if you enabled AWS account-level EBS encryption by default, AWS enforces account-level EBS encryption using a separate key that you provided to them. Note that if reuse_key_for_cluster_volumes is true and you revoke the permission for the key, it does not affect running clusters but affects new and restarted clusters.

    Example request:

    curl -X POST -n \
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/customer-managed-keys' \
      -d '{
      "use_cases": ["MANAGED_SERVICES", "STORAGE"],
      "aws_key_info": {
        "key_arn": "arn:aws:kms:<region>:<aws-account-id>:key/<key-id>",
        "key_alias": "my-example-key",
        "reuse_key_for_cluster_volumes": true
      }
    }'
    

    Example response:

    {
      "use_cases": ["MANAGED_SERVICES", "STORAGE"],
      "customer_managed_key_id": "<databricks-key-config-id>",
      "creation_time": 1586447506984,
      "account_id": "<databricks-account-id>",
      "aws_key_info": {
          "key_arn": "arn:aws:kms:<region>:<aws-account-id>:key/<key-id>",
          "key_alias": "my-example-key",
          "reuse_key_for_cluster_volumes": true,
          "key_region": "<region>"
      }
    }
    
  3. From the response JSON, copy the customer_managed_key_id. You use that ID in the next step to set your workspace configuration object’s property managed_services_customer_managed_key_id, storage_customer_managed_key_id, or both, depending on which encryption use cases this object represents.

    Note

    If you are planning to add the key for both CMK use cases, note that workspace storage encryption cannot be rotated (updated) after you have already set a key. Do not attempt to rotate an existing key for workspace storage.

  4. Terminate all running clusters, pools, and SQL warehouses.

  5. Update a workspace with your key configuration using the Account API. Use the Databricks Account API 2.0 to update your workspace.

    Important

    After you run the key rotation command, you must keep your old KMS key available to Databricks for 24 hours.

    Call the Account API operation to update a workspace (PATCH /accounts/{account_id}/workspaces/{workspace_id}).

    To add the key for managed services, the only required argument is the managed_services_customer_managed_key_id. Set it to the JSON response when you registered your key configuration.

    For example:

    curl -X PATCH -n \
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<workspace-id>' \
      -d '{
      "managed_services_customer_managed_key_id": "<databricks-key-config-id>",
    }'
    

    Note

    If you plan to add the key for both use cases, also set the storage_customer_managed_key_id variable to the same value.

  6. If you are adding keys for both managed services and storage use cases, wait at least 20 mins after your API update before proceeding. During this time, you must not start any clusters or use the DBFS API. If you are only adding the key for managed services use case only, you can omit the 20 minute wait.

  7. Restart any clusters, pools, and SQL warehouses that you terminated in a previous step.