Data security and encryption
This article introduces data security configurations to help protect your data.
For information about securing access to your data, see Data governance with Unity Catalog.
Overview of data security and encryption
Databricks provides encryption features to help protect your data. Not all security features are available on all pricing tiers. The following table contains an overview of the features and how they align to pricing plans.
Feature |
Pricing tier |
---|---|
Customer-managed keys for encryption |
Enterprise |
Encrypt traffic between cluster worker nodes |
Enterprise |
Encrypt queries, query history, and query results |
Enterprise |
Enable customer-managed keys for encryption
Databricks supports adding a customer-managed key to help protect and control access to data. There are two customer-managed key features for different types of data:
Customer-managed keys for managed services: Managed services data in the Databricks control plane is encrypted at rest. You can add a customer-managed key for managed services to help protect and control access to the following types of encrypted data:
Notebook source files that are stored in the control plane.
Notebook results for notebooks that are stored in the control plane.
Secrets stored by the secret manager APIs.
Databricks SQL queries and query history.
Personal access tokens or other credentials used to set up Git integration with Databricks Git folders.
Customer-managed keys for workspace storage: You can configure your own key to encrypt the data on the Amazon S3 bucket in your AWS account that you specified when you created your workspace. You can optionally use the same key to encrypt your cluster’s EBS volumes.
For more details of which customer-managed key features in Databricks protect different types kinds of data, see Customer-managed keys for encryption.
Encrypt queries, query history, and query results
You can use your own key from AWS KMS to encrypt the Databricks SQL queries and your query history stored in the Databricks control plane. For more details, see Encrypt queries, query history, and query results
Encrypt S3 buckets at rest
Databricks supports encrypting data in S3 using server-side encryption. You can encrypt writes to S3 with a key from KMS. This ensures that your data is safe in case it is lost or stolen. See Configure encryption for S3 with KMS. To encrypt your workspace storage bucket, see Customer-managed keys for encryption.
To configure server-side encryption to allow external tables and volumes in Unity Catalog to access data in S3, see Configure an encryption algorithm on an external location.
Encrypt traffic between cluster worker nodes
User queries and transformations are typically sent to your clusters over an encrypted channel. By default, however, the data exchanged between worker nodes in a cluster is not encrypted. If your environment requires that data be encrypted at all times, whether at rest or in transit, you can create an init script that configures your clusters to encrypt traffic between worker nodes, using AES 128-bit encryption over a TLS 1.2 connection. For more information, see Encrypt traffic between cluster worker nodes.
Manage workspace settings
Databricks workspace administrators can manage their workspace’s security settings, such as the ability to download notebooks and enforcing the user isolation cluster access mode. For more information, see Manage your workspace.