Customer-managed keys for encryption
Note
Customer-managed key features require your workspace to be on the E2 version of the Databricks platform. These features require the Enterprise pricing tier.
Some services and data support adding a customer-managed key to help protect and control access to encrypted data. Databricks has two customer-managed key features that involve different types of data and locations:
The following table lists which customer-managed key features are used for which types of data.
Type of data |
Location |
Which customer-managed key feature to use |
---|---|---|
Notebook source and metadata |
||
Personal access tokens (PAT) or other credentials used for Git integration with Databricks Repos |
||
Secrets stored by the secret manager APIs |
||
Databricks SQL queries and query history |
||
The remote EBS volumes for Databricks Runtime cluster nodes and other compute resources. |
Compute plane in your AWS account. The customer managed keys for remote EBS volumes applies only to compute resources in the classic compute plane in your AWS account. See Serverless compute and customer-managed keys. |
|
Your workspace’s DBFS root in your workspace root S3 bucket in your AWS account. This also includes workspace libraries and the FileStore area. |
||
Job results |
Workspace root S3 bucket in your AWS account |
|
Databricks SQL query results |
Workspace root S3 bucket in your AWS account |
|
Workspace root S3 bucket in your AWS account |
||
If you use a DBFS path in your DBFS root, this is stored in the workspace root S3 bucket in your AWS account. This does not apply to DBFS paths that represent mount points to other data sources. |
||
By default, when you run a notebook interactively (rather than as a job) results are stored in the control plane for performance with some large results stored in your workspace root S3 bucket in your AWS account. You can choose to configure Databricks to store all interactive notebook results in your AWS account. |
For partial results in the control plane, use a customer-managed key for managed services. For results in the root S3 bucket, which you can configure for all result storage, use a customer-managed key for workspace storage. |
Serverless compute and customer-managed keys
Databricks SQL Serverless supports:
Customer-managed keys for managed services for Databricks SQL queries and query history.
Customer-managed keys for your workspace’s S3 bucket including root DBFS storage for Databricks SQL results.
Serverless SQL warehouses do not use customer-managed keys for EBS storage encryption on compute nodes, which is an optional part of configuring customer-managed keys for workspace storage. Disks for serverless compute resources are short-lived and tied to the lifecycle of the serverless workload. When compute resources are stopped or scaled down, the VMs and their storage are destroyed.
Model Serving
Resources for Model Serving, a serverless compute feature, are generally in two categories:
Resources that you create for the model are stored in your workspace’s DBFS root in your workspace’s S3 bucket. This includes the model’s artifacts and version metadata. Both the workspace model registry and MLflow use this storage. You can configure this storage to use customer-managed keys. See Customer-managed keys for workspace storage.
Resources that Databricks creates directly on your behalf include the model image and ephemeral serverless compute storage. These are encrypted with Databricks-managed keys and do not support customer-managed keys.
Customer-managed keys for EBS storage, which is an optional part of the customer-managed workspace storage feature, does not apply to serverless compute resources. Disks for serverless compute resources are short-lived and tied to the lifecycle of the serverless workload. When compute resources are stopped or scaled down, the VMs and their storage are destroyed.