Customer managed keys for notebooks

Preview

This feature is in Private Preview. It is not available on all Databricks deployment types and subscriptions. Contact your Databricks representative to request access.

Note

This feature requires the Enterprise plan or higher.

Introduction

Security-conscious organizations have risk management processes that evaluate risks of public cloud use, SaaS applications, and third-party services. Reducing risk from third-party service providers helps build a strong case for using external services. Some regulated industries may require encryption of some types of data with keys that they manage. These considerations are especially important for sectors that regularly use personal data or other confidential information.

Workspace notebooks are primarily stored in the Databricks Control Plane in a database. The Databricks platform allows you to encrypt notebooks with your own key. The key must be provided at the time a workspace is created.

Note

This feature does not encrypt data stored outside of the control plane. For example, it does not encrypt data in your root S3 bucket.

Important

To use customer-managed keys to encrypt notebooks, you must deploy your workspace in the region us-east-1, us-east-2, or us-west-2 for your data plane (VPC).

How it works

A customer-managed key encrypts the workspace’s notebooks in the control plane. Customers provide a secret revocable key called a customer-managed key (CMK), which is specified by its ID in the cloud service’s key management system. In AWS, customer keys are managed by AWS KMS.

Additionally, Databricks creates a Databricks-managed key (DMK) for each workspace. The DMK is wrapped by the CMK to generate the combined encryption key, called the data encryption key (DEK). Databricks uses the DEK to encrypt the workspace’s notebook.

The DEK is cached in memory for several read/write operations and evicted from memory at a regular interval such that new requests require another request to your cloud service’s key management system. If you delete or revoke your key, reading or writing to notebooks fails at the end of the cache time interval.

You add the CMK to your Databricks workspace configuration during workspace creation.

Customer-managed keys work for notebooks

Adding a customer-managed key for notebooks

To add a customer-managed for notebooks, you must add the CMK when you create a workspace using the Multi-workspace API.

To configure your CMK:

  1. Create or select a KMS key in AWS, following the instructions in Creating CMKs in a custom key store.

  2. Copy these values. You will use them when you create the workspace:

    • Key ARN — Get the ARN from the console or the API (the Arn field in the JSON response).
    • Key alias — An alias specifies a display name for a the CMK in AWS Key Management Service (AWS KMS). Use an alias to identify a CMK in cryptographic operations. For more information, see the AWS documents AWS::KMS::Alias and Working with aliases.
    • Key region — The AWS region for the key.
  3. Using Policy View, edit the key policy so that Databricks can use the KMS Key to perform encryption and decryption operations. Set it to the following:

    {
      "Version": "2012-10-17",
      "Id": "key-policy-databricks",
      "Statement": [
        {
          "Sid": "Allow Databricks to use KMS key for Notebooks",
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::414351767826:root"
          },
          "Action": [
            "kms:Encrypt",
            "kms:Decrypt"
          ],
          "Resource": "*"
        }
      ]
    }
    
  4. To register the key, follow the instructions in Create a new workspace using the Multi-workspace API, specifically Step 4: Configure customer-managed key for notebooks (optional).

    Important

    You must set up your keys during workspace creation. You cannot add these keys after you have created the workspace.