Customer-managed keys for workspace storage

Preview

This feature is in Public Preview.

To use customer-managed keys for workspace storage, the workspace must be on the E2 version of the Databricks platform. All new Databricks accounts and most existing accounts are now E2, in which case you can create new E2 workspaces. Some older workspaces in your account may not be E2, so you cannot add customer-managed keys for workspace storage to those workspaces. If you are unsure which account type you have or which version a workspace is on, contact your Databricks representative. This feature also requires Enterprise pricing tier.

Workspace data plane VPCs can be in AWS regions ap-northeast-1, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-west-1, eu-west-2, eu-central-1, us-east-1, us-east-2, us-west-1, and us-west-2. However, you cannot use a VPC in us-west-1 if you want to use customer-managed keys to encrypt managed services or workspace storage.

Important

Support for customer-managed keys for EBS volumes applies only to VMs in the Classic data plane, not to VMs in the Serverless data plane.

Introduction

You can add your own encryption key to encrypt workspace storage in multiple ways:

  • Your workspace’s root S3 bucket: If you add a workspace storage encryption key, Databricks encrypts the data on the AWS S3 bucket in your AWS account that you specified when you set up your workspace, which is sometimes called the workspace’s root S3 bucket. This bucket contains your workspace’s DBFS root (for example, workspace libraries and the FileStore area) and your workspace’s system data (for example, job results and notebook revisions).
  • Your cluster’s EBS volumes (Optional): Each Databricks Runtime cluster node’s remote EBS volume.

Notable points about this feature:

  • Only new root S3 bucket writes are affected: After you add a storage key, the new key is used only for new data in the root S3 bucket. Existing data in the root S3 bucket is not guaranteed to be re-encrypted with your key. Existing data may be encrypted with your key if later it is rewritten or updated. Services that read from this bucket can read objects written both before and after the addition of your key.
  • DBFS mounts are unaffected: The only DBFS data this feature affects is your DBFS root in your root S3 bucket. It contains workspace libraries and the FileStore area. It’s not intended for production customer data. You may have created one or many DBFS mounts of other buckets or data sources, but data on those DBFS mounts is not encrypted using the customer-managed key for workspace storage. For S3 DBFS mounts, there are other approaches to writing encrypted data with your keys.
  • Control plane encryption is unaffected: This feature does not affect data stored in the Databricks control plane in the Databricks AWS account. To encrypt notebook commands and secrets in the control plane, see Customer-managed keys for managed services.
  • To help diagnose issues, enable S3 object-level logging: Databricks recommends that you enable CloudTrail S3 object-level logging on your root S3 bucket.
  • Key rotation: After you configure customer-managed keys for storage for a workspace, you cannot later rotate the key by setting a different key ARN for the workspace. However, AWS provides automatic CMK master key rotation, which rotates the underlying key without changing the key ARN as described in AWS docs. Automatic CMK master key rotation is compatible with Databricks customer-managed keys for storage.

You can share an AWS KMS key or a Databricks key configuration for workspace storage encryption across workspaces.

The documentation and the Databricks APIs refer to storage encryption (in your AWS account) and managed services encryption (in the Databricks AWS account) as the two use cases of customer-managed keys in Databricks. To share a key and a key configuration for a workspace across both use cases, you must do it at workspace creation time (including updating a failed workspace) because customer-managed keys for managed services must be added during workspace creation time. If you have already enabled managed services encryption for an existing workspace, you can share the key for workspace storage, but you must create and register a new key configuration with the same key ARN for workspace storage.

To create a new workspace with workspace storage encryption, skip the rest of this article and see Create a new workspace using the Account API. That article describes how to optionally share a key configuration across the two encryption use cases: workspace storage and Customer-managed keys for managed services.

This article describes how to add an encryption key for workspace storage to a running workspace.

Step 1: Create or select a key

  1. Create or select a symmetric key in AWS KMS, following the instructions in Creating symmetric CMKs or Viewing keys.

    Important

    The KMS key must be in the same AWS region as your workspace.

  2. Copy these values, which you need in a later step:

    • Key ARN: Get the ARN from the console or the API (the Arn field in the JSON response).
    • Key alias: An alias specifies a display name for the CMK in AWS KMS. Use an alias to identify a CMK in cryptographic operations. For more information, see the AWS documentation: AWS::KMS::Alias and Working with aliases.
  3. On the Key policy tab, switch to the policy view. Edit the key policy so that Databricks can use the key to perform encryption and decryption operations.

    Select a tab below and click Copy. Select the second tab only if you do not want to use this key to encrypt cluster EBS volumes.

    Add the JSON to your key policy in the "Statement" section.

    {
      "Sid": "Allow Databricks to use KMS key for DBFS",
      "Effect": "Allow",
      "Principal":{
        "AWS":"arn:aws:iam::414351767826:root"
      },
      "Action": [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*",
        "kms:DescribeKey"
      ],
      "Resource": "*"
    },
    {
      "Sid": "Allow Databricks to use KMS key for DBFS (Grants)",
      "Effect": "Allow",
      "Principal":{
        "AWS":"arn:aws:iam::414351767826:root"
      },
      "Action": [
        "kms:CreateGrant",
        "kms:ListGrants",
        "kms:RevokeGrant"
      ],
      "Resource": "*",
      "Condition": {
        "Bool": {
          "kms:GrantIsForAWSResource": "true"
        }
      }
    },
    {
      "Sid": "Allow Databricks to use KMS key for EBS",
      "Effect": "Allow",
      "Principal": {
        "AWS": "<aws-arn-for-your-credentials>"
      },
      "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey*",
        "kms:CreateGrant",
        "kms:DescribeKey"
      ],
      "Resource": "*",
      "Condition": {
        "ForAnyValue:StringLike": {
          "kms:ViaService": "ec2.*.amazonaws.com"
        }
      }
    }
    
    {
      "Sid": "Allow Databricks to use KMS key for DBFS",
      "Effect": "Allow",
      "Principal":{
        "AWS":"arn:aws:iam::414351767826:root"
      },
      "Action": [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*",
        "kms:DescribeKey"
      ],
      "Resource": "*"
    },
    {
      "Sid": "Allow Databricks to use KMS key for DBFS (Grants)",
      "Effect": "Allow",
      "Principal":{
        "AWS":"arn:aws:iam::414351767826:root"
      },
      "Action": [
        "kms:CreateGrant",
        "kms:ListGrants",
        "kms:RevokeGrant"
      ],
      "Resource": "*",
      "Condition": {
        "Bool": {
          "kms:GrantIsForAWSResource": "true"
        }
      }
    }
    

    For more information, see the AWS article Editing keys.

Step 2: Create a new key configuration using the Account API

  1. To register your KMS key with Databricks, call the create customer-managed key configuration API (POST /accounts/<account-id>/customer-managed-keys), which creates a Databricks key configuration.

    Pass the following parameters:

    • use_cases: Set this to the following JSON array with one element ["STORAGE"]. Note that if you are creating new workspace, you can create a single key configuration for both use cases (see Create a new workspace using the Account API).

    • aws_key_info: A JSON object with the following properties:

      • key_arn: AWS KMS key ARN. Note that Databricks infers the AWS region from the key ARN.
      • key_alias: (Optional) AWS KMS key alias.
      • reuse_key_for_cluster_volumes: (Optional) If the use_case array contains STORAGE, this specifies whether to also use the key to encrypt cluster EBS volumes. The default value is true, which means Databricks also uses the key for cluster volumes. If you set this to false, Databricks does not encrypt the EBS volumes with your specified key. In that case, your Databricks EBS volumes are encrypted either with default AWS SSE encryption or if you enabled AWS account-level EBS encryption by default, AWS enforces account-level EBS encryption using a separate key that you provided to them.

      Example request:

      curl -X POST -n \
        'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/customer-managed-keys' \
        -d '{
        "use_case": ["STORAGE"]
        "aws_key_info": {
          "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>",
          "key_alias": "my-example-key",
          "reuse_key_for_cluster_volumes": true
        }
      }'
      

      Example response:

      {
        "use_case": ["STORAGE"],
        "customer_managed_key_id": "<aws-kms-key-id>",
        "creation_time": 1586447506984,
        "account_id": "<databricks-account-id>",
        "aws_key_info": {
           "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>",
           "key_alias": "my-example-key",
           "reuse_key_for_cluster_volumes": true,
           "key_region": "us-west-2"
        }
      }
      
  2. From the response, copy the customer_managed_key_id for use in the next step.

Step 3: Shut down all clusters

Terminate all running clusters, pools, and SQL endpoints.

Step 4: Update a workspace with your key configuration using the Account API

Use the Databricks Account API to update your workspace.

Call the Account API operation to update a workspace (PATCH /accounts/{account_id}/workspaces). The only request parameter you need to pass is storage_customer_managed_key_id property. Set it to the customer_managed_key_id from the JSON response when you registered your key configuration.

For example:

curl -X PATCH -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces' \
  -d '{
  "storage_customer_managed_key_id": "<aws-kms-key-config-id>",
}'

Step 5: Wait for the key information to propagate

Wait at least 20 mins after your API update before proceeding.

Important

During this time, you must not start any clusters.

Step 6: Restart your clusters

Restart any clusters, pools, and SQL endpoints that you terminated in a previous step.