Skip to main content

Customer-managed keys for encryption

note

This feature requires the Enterprise tier.

This page provides an overview of customer-managed keys for encryption. Some services and data support adding a customer-managed key to help protect and control access to encrypted data. You can use the key management service in your cloud to maintain a customer-managed encryption key.

For instructions on how to configure keys, see Configure customer-managed keys for encryption.

Customer-managed key use cases

Databricks has two customer-managed key use cases that involve different types of data and locations:

  • Managed services: Data in the Databricks control plane, including notebooks, secrets, SQL query data, and data stored in default storage.
  • Workspace storage: Your workspace storage bucket (which contains DBFS root) and the EBS volumes of compute resources in the classic compute plane. Does not apply to default storage.

Unity Catalog also supports the ability to read from and write to S3 buckets with KMS encryption enabled. See Create a storage credential for connecting to AWS S3

note

For serverless workspaces (Public Preview), you only need to configure keys for managed services, which applies to the workspace storage and root storage.

:::

Customer-managed keys for managed services

Managed services data in the Databricks control plane is encrypted at rest. You can add a customer-managed key for managed services to help protect and control access to the following types of encrypted data:

note

Because serverless workspaces use default storage, the managed services use case also applies to the workspace storage and root storage, which includes job results, Databricks SQL results, notebook revisions, and other workspace data.

To configure customer-managed keys for managed services, see Configure customer-managed keys for encryption.

important

Only AI/BI dashboards created after November 1, 2024 are encrypted and compatible with customer-managed keys.

Customer-managed keys for workspace storage

You can add a customer-managed key for workspace storage to protect and control access to the following types of encrypted data:

  • Your workspace storage bucket: If you add a workspace storage encryption key, Databricks encrypts the data on the Amazon S3 bucket in your AWS account that you specified when you set up your workspace, which is known as the workspace storage bucket. This bucket contains DBFS root, which includes the FileStore area, MLflow Models, and Lakeflow Declarative Pipelines data in your DBFS root (not DBFS mounts). The bucket also includes workspace system data, which includes job results, Databricks SQL results, notebook revisions, and other workspace data. For more information, see Create an S3 bucket for workspace deployment.
  • Your cluster's EBS volumes (optional): For Databricks Runtime cluster nodes and other compute resources in the classic compute plane, you can optionally use the key to encrypt the VM's remote EBS volumes.
note

This feature affects your DBFS root but is not used for encrypting data on any additional DBFS mounts. For S3 DBFS mounts, you can use other approaches to writing encrypted data with your keys. For more information, see Encrypt data in S3 buckets. Mounts are a legacy access pattern. Databricks recommends using Unity Catalog for managing all data access. See Connect to cloud object storage using Unity Catalog.

Compare customer-managed keys use cases

The following table lists which customer-managed key features are used for which types of data.

note

The encryption use cases can vary based on the type of workspace. Serverless workspaces only use the managed services use case.

Type of data

Location

Which customer-managed key feature to use

AI/BI dashboards

Control plane

Managed services

Notebook source and metadata

Control plane

Managed services

Personal access tokens (PAT) or other credentials used for Git integration with Databricks Git folders

Control plane

Managed services

Secrets stored by the secret manager APIs

Control plane

Managed services

Databricks SQL queries and query history

Control plane

Managed services

Vector Search indexes and metadata

Serverless compute plane

Managed services

The remote EBS volumes for Databricks Runtime cluster nodes and other compute resources.

Traditional workspaces: Classic compute plane in your AWS account
Serverless workspaces: Not applicable (storage is ephemeral on serverless compute)

Traditional workspaces: Workspace storage
Serverless workspaces: Not applicable

Root storage data

Traditional workspaces: DBFS root in your workspace storage bucket in your AWS account. This also includes the FileStore area.
Serverless workspaces: The workspace's default storage

Traditional workspaces: Workspace storage
Serverless workspaces: Managed services

Job results

Traditional workspaces: Workspace storage bucket in your AWS account
Serverless workspaces: The workspace's default storage

Traditional workspaces: Workspace storage
Serverless workspaces: Managed services

Databricks SQL query results

Traditional workspaces: Workspace storage bucket in your AWS account
Serverless workspaces: The workspace's default storage

Traditional workspaces: Workspace storage
Serverless workspaces: Managed services

MLflow Models

Traditional workspaces: Workspace storage bucket in your AWS account
Serverless workspaces: The workspace's default storage

Traditional workspaces: Workspace storage
Serverless workspaces: Managed services

Lakeflow Declarative Pipelines

Traditional workspaces: If you use a DBFS path in your DBFS root, this is stored in the workspace storage bucket in your AWS account. This does not apply to DBFS paths that represent mount points to other data sources.
Serverless workspaces: The workspace's default storage

Traditional workspaces: Workspace storage
Serverless workspaces: Managed services

Interactive notebook results

Traditional workspaces: By default, when you run a notebook interactively (rather than as a job) results are stored in the control plane for performance with some large results stored in your workspace storage bucket in your AWS account. You can choose to configure Databricks to store all interactive notebook results in your AWS account.
Serverless workspaces: The workspace's default storage

Traditional workspaces: For partial results in the control plane, use a customer-managed key for managed services. For results in the workspace storage bucket, which you can configure for all result storage, use a customer-managed key for workspace storage.
Serverless workspaces: Managed services

Serverless compute and customer-managed keys

Databricks SQL Serverless and serverless compute support:

  • Keys for managed services like Databricks SQL queries, query history, notebook source and metadata, and vector search indexes and metadata.
  • Workspace storage keys including root storage for Databricks SQL and notebook results.

Encryption for remote EBS volumes does not apply to serverless compute because disks for serverless compute resources are short-lived and tied to the lifecycle of the serverless workload. When serverless compute resources are stopped or scaled down, the VMs and their storage are destroyed.

Model Serving

Resources for Model Serving, a serverless compute feature, are generally in two categories:

  • Resources that you create for the model are stored in your workspace's root storage. This includes the model's artifacts and version metadata. Both the workspace model registry and MLflow use this storage. You can configure this storage to use customer-managed keys.
  • Resources that Databricks creates directly on your behalf include the model image and ephemeral serverless compute storage. These are encrypted with Databricks-managed keys and do not support customer-managed keys.