Data security and encryption

This article introduces data security configurations to help protect your data.

For information about securing access to your data, see Data governance with Unity Catalog.

Overview of data security and encryption

Databricks provides encryption features to help protect your data. Not all security features are available on all pricing tiers. The following table contains an overview of the features and how they align to pricing plans.

Feature

Pricing tier

Customer-managed keys for encryption

Enterprise

Encrypt traffic between cluster worker nodes

Enterprise

Encrypt queries, query history, and query results

Enterprise

Enable customer-managed keys for encryption

Databricks supports adding a customer-managed key to help protect and control access to data. There are two customer-managed key features for different types of data:

  • Customer-managed keys for managed services: Managed services data in the Databricks control plane is encrypted at rest. You can add a customer-managed key for managed services to help protect and control access to the following types of encrypted data:

    • Notebook source files that are stored in the control plane.

    • Notebook results for notebooks that are stored in the control plane.

    • Secrets stored by the secret manager APIs.

    • Databricks SQL queries and query history.

    • Personal access tokens or other credentials used to set up Git integration with Databricks Git folders.

  • Customer-managed keys for workspace storage: You can configure your own key to encrypt the data on the Amazon S3 bucket in your AWS account that you specified when you created your workspace. You can optionally use the same key to encrypt your cluster’s EBS volumes.

For more details of which customer-managed key features in Databricks protect different types kinds of data, see Customer-managed keys for encryption.

Encrypt queries, query history, and query results

You can use your own key from AWS KMS to encrypt the Databricks SQL queries and your query history stored in the Databricks control plane. For more details, see Encrypt queries, query history, and query results

Encrypt S3 buckets at rest

Databricks supports encrypting data in S3 using server-side encryption. You can encrypt writes to S3 with a key from KMS. This ensures that your data is safe in case it is lost or stolen. See Configure encryption for S3 with KMS. To encrypt your workspace’s root S3 bucket, see Customer-managed keys for encryption.

Encrypt traffic between cluster worker nodes

User queries and transformations are typically sent to your clusters over an encrypted channel. By default, however, the data exchanged between worker nodes in a cluster is not encrypted. If your environment requires that data be encrypted at all times, whether at rest or in transit, you can create an init script that configures your clusters to encrypt traffic between worker nodes, using AES 128-bit encryption over a TLS 1.2 connection. For more information, see Encrypt traffic between cluster worker nodes.

Manage workspace settings

Databricks workspace administrators can manage their workspace’s security settings, such as the ability to download notebooks and enforcing the user isolation cluster access mode. For more information, see Manage your workspace.