Security overview

This article provides an overview of security controls and configurations for deployment and management of Databricks accounts and workspaces. For information about securing your data, see Data governance best practices.

Not all security features are available on all pricing tiers. See the Databricks AWS pricing page to learn how features align to pricing plans.

Note

This article focuses on the most recent (E2) version of the Databricks platform. Some of the features described here may not be supported on legacy deployments that have not migrated to the E2 platform.

Accounts and workspaces

In Databricks, a workspace is a Databricks deployment in the cloud that functions as the unified environment that a specified set of users use for accessing all of their Databricks assets. Your organization can choose to have multiple workspaces or just one, depending on your needs.

A Databricks account represents a single entity for purposes of billing and support. An account can include multiple workspaces.

Account admins handle general account management and workspace admins manage the settings and features of individual workspaces in the account. To learn more about Databricks admins, see Databricks administration guide. Admins can deploy workspaces with security configurations including:

Deploy a workspace in your own VPC

An AWS Virtual Private Cloud (VPC) lets you provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network. The VPC is the network location for your Databricks clusters. By default, Databricks creates and manages a VPC for the Databricks workspace.

You can instead provide your own VPC to host your Databricks clusters, enabling you to maintain more control of your own AWS account and limit outgoing connections. To take advantage of a customer-managed VPC, you must specify a VPC when you first create the Databricks workspace. You can share VPCs across workspaces, but you cannot share subnets across workspaces. For more information, see Customer-managed VPC.

Enable customer-managed keys for encryption

Databricks supports adding a customer-managed key to help protect and control access to data. There are three customer-managed key features for different types of data:

  • Customer-managed keys for managed services: Managed services data in the Databricks control plane is encrypted at rest. You can add a customer-managed key for managed services to help protect and control access to the following types of encrypted data:

    • Notebook source files that are stored in the control plane.

    • Notebook results for notebooks that are stored in the control plane.

    • Secrets stored by the secret manager APIs.

    • Databricks SQL queries and query history.

    • Personal access tokens or other credentials used to set up Git integration with Databricks Repos.

For more information, see Customer-managed keys for managed services.

  • Customer-managed keys for workspace storage: You can configure your own key to encrypt the data on the Amazon S3 bucket in your AWS account that you specified when you created your workspace. You can optionally use the same key to encrypt your cluster’s EBS volumes. For more information, see Customer-managed keys for workspace storage

For more details of which customer-managed key features in Databricks protect different types kinds of data, see Customer-managed keys for encryption.

Identity

Users, groups, and service principals are configured in the Databricks account and workspaces by administrators. For information on how to securely configure identity in Databricks, see Identity best practices.

Secure API access

For REST API authentication, you can use built-in revocable Databricks personal access tokens. You can create personal access tokens in the web application user interface or using the Tokens API.

Workspace admins can use the Token Management API to review current Databricks personal access tokens, delete tokens, and set the maximum lifetime of new tokens for their workspace. You can use the related Permissions API to control which users can create and use tokens to access workspace REST APIs.

Note

While Databricks strongly recommends using tokens, Databricks users on AWS can also access REST APIs using their Databricks username and password (native authentication). You grant and revoke the ability for specific users to use native authentication using password access control.

IP access lists

Authentication proves user identity, but it does not enforce the network location of the users. Accessing a cloud service from an unsecured network poses security risks, especially when the user may have authorized access to sensitive or personal data. With IP access lists, you can configure Databricks workspaces so that users connect to the service only through existing networks with a secure perimeter.

Workspace admins can specify the IP addresses (or CIDR ranges) on the public network that are allowed access. These IP addresses could belong to egress gateways or specific user environments. You can also specify IP addresses or subnets to block. For details, see IP access lists.

You can also use PrivateLink to block all public internet access to a Databricks workspace.

Audit and usage logs

Databricks provides access to audit logs of activities performed by Databricks users, allowing you to monitor detailed usage patterns. You can configure two types of audit and usage logging:

Cluster policies

You can use cluster policies to enforce particular cluster settings, such as instance types, number of nodes, attached libraries, and compute cost, and display different cluster-creation interfaces for different user levels. Managing cluster configurations using policies can help enforce universal governance controls and manage the costs of your compute infrastructure. For more information, see Manage cluster policies.

Access control lists

In Databricks, you can use access control lists (ACLs) to configure permission to access objects, such as: notebooks, experiments, models, clusters, jobs, dashboards, queries, and SQL warehouses. All admin users can manage access control lists, as can users who have been given delegated permissions to manage access control lists. See Access control.

For information about managing access to your organization’s data, see Data governance guide.

Secrets

You can use Databricks secrets to store credentials and reference them in notebooks and jobs. A secret is a key-value pair that stores secret material for an external data source or other calculation, with a key name unique within a secret scope. You should never hard code secrets or store them in plain text.

You create secrets using either the REST API or CLI, but you must use the Secrets utility (dbutils.secrets) in a notebook or job to read your secrets.

Alternatively, you can store secrets in AWS Secrets Manager, create an IAM role to access the secrets, and then add that role to the cluster IAM roles. See Configure S3 access with instance profiles.

For information on how to use Databricks secrets, see Secret management.

Automation template options

Using Databricks REST APIs, some of your security configuration tasks can be automated using Terraform or AWS Quick Start (CloudFormation) templates. These templates can be used to configure and deploy new workspaces as well as to update administrative configurations for existing workspaces. Particularly for large companies with dozens of workspaces, using templates can enable fast and consistent automated configurations.

See Automate workspace creation with Account API templates.

Learn more

Here are some resources to help you build a comprehensive security solution that meets your organization’s needs: