Enable the compliance security profile

If a Databricks workspace has the compliance security profile enabled, the workspace has additional features and controls. The profile enables additional monitoring, enforced instance types for inter-node encryption, a hardened compute image, and other features. For details, see Features and technical controls.

The compliance security profile includes controls that help meet certain security requirements in some compliance standards. However, you can choose to enable the compliance security profile for its enhanced security features without the need to conform to any compliance standard.

Enabling the compliance security profile is required to use Databricks to process data that is regulated under the following compliance standards:

Choose how you want to enable the compliance security profile:

  • Account level: You can choose to apply the compliance security profile to your account, in which case all existing and future workspaces in the account use the security profile.

  • Workspace level: You can specify which workspaces for which security profiles are enabled.

Requirements

  • Your Databricks account must include the Enhanced Security and Compliance add-on. For details, see the pricing page.

  • Your Databricks workspace is on the E2 version of the platform.

  • Your Databricks workspace is on the Enterprise tier.

  • Single sign-on (SSO) authentication is configured for the workspace.

Enable the compliance security profile

  1. Prepare any existing workspaces that will use the security profile. See Prepare a workspace for the compliance security profile.

  2. Contact your Databricks representative and request adding the compliance security profile at the account level or just for some workspaces.

    If you want to enable it just for some workspaces, send the list of workspace IDs for the workspaces that you would like to use for the profile. Get a workspace ID from the URL when you are using the workspace. Look for o= in the URL. The number after o= is the Databricks workspace ID. For example, if the URL is https://<databricks-instance>/?o=6280049833385130, the workspace ID is 6280049833385130.

  3. Wait for confirmation that the profile is now enabled.

  4. If any clusters or SQL warehouses were running, restart them. If you have many clusters running and only want to restart the ones that were started before enablement, you can use a script that Databricks provides that determines for all clusters if the start time was before the enablement date.

    Setup is complete. Create or use Databricks compute resources as desired.

If you enable the compliance security profile for your account or your workspace, long-running clusters automatically restart after 25 days. Databricks recommends that admins regularly restart clusters before they run for 25 days and do so during a scheduled maintenance window. This reduces the risk of an auto-restart disrupting a scheduled job. You can use a script that Databricks provides that can determine how long your clusters have been running, and optionally restart them. See Restart a cluster. If you want to restart long running clusters manually, you can use a script that Databricks provides that can determine how long your clusters have been running, and optionally restart them. See Run a script that determines how many days your clusters have been running, and optionally restart them.

Note

If your workspace is part of the public preview of automatic cluster update, the behavior is different. Compute resources automatically restart only if updates are needed. You can choose a regular monthly or twice-monthly schedule, in which case the 25-day limit does not apply and the legacy workspace setting Automatic Restart of Long Running Clusters is ignored.

Prepare a workspace for the compliance security profile

Some steps are necessary to prepare a workspace for the compliance security profile. If you have not yet enabled the security profile, do these steps before requesting to enable the security profile.

If the security profile is already enabled at an account level and you create any new workspaces, you must do these steps after you create any new workspace.

  1. If you enable the compliance security profile for your account or your workspace, long-running clusters are automatically restarted after 25 days. If any clusters were running 25 days or longer when the compliance security profile is enabled, the clusters immediately restart, which causes running jobs to fail. Instead, check for long-running clusters before you enable the security profile. This reduces the risk of an auto-restart disrupting a scheduled job. Check how long your clusters have been running and restart any that have been running longer than 20 days (not 25 days) to reduce the risk of clusters being auto-restarted after 25 days running when the security profile is enabled. See Restart a cluster.

    Note

    If your workspace is part of the public preview of automatic cluster update, the behavior is different. Compute resources automatically restart only if updates are needed. You can choose a regular monthly or twice-monthly schedule, in which case the 25-day limit does not apply and the legacy workspace setting Automatic Restart of Long Running Clusters is ignored.

  2. Configure Single sign-on (SSO) authentication.

  3. Add required network ports.

    • For workspaces with PrivateLink back-end connectivity: You must make a change to support FIPS encryption if the workspace uses a PrivateLink back-end connection for private connectivity between the Classic data plane in your AWS account and the Databricks control plane in the Databricks account.

      One of the networking requirements for PrivateLink back-end connections is to create a separate security group for the endpoint that allows HTTPS/443 and TCP/6666 with bidirectional access (from and to) for both the workspace subnets and the endpoint subnet itself. This configuration allows access for both REST APIs (port 443) and secure cluster connectivity (6666). You can then use the security group for both purposes.

      To support the upcoming changes for FIPS encryption, update your network security group to additionally allow bidirectional access to port 2443 for FIPS connections. The total set of ports to allow bidirectional access are 443, 2443, and 6666.

    • For workspaces with no PrivateLink back-end connectivity: If the workspace does not use a PrivateLink back-end connection for private connectivity but the workspace is configured to restrict outbound network access, you need to allow traffic to additional endpoints to support FIPS endpoints.

      To support the upcoming changes for FIPS encryption, update your network security group (or firewall) to allow outbound access from the data plane to the control plane on port 2443 for FIPS connections. This is in addition to outgoing port 443 access that you are required to allow already. For related information about related security group and firewall configuration for customer-managed VPCs, see Security groups and Configure a firewall and outbound access (Optional).

  4. If any workspace is in the US East Region, the US West Region, or Canada (Central) Region, and it’s configured to restrict outbound network access, you need to allow traffic to additional endpoints to support FIPS endpoints. Remember that if you use those regions and do not restrict outgoing access now, if you restrict outgoing access in the future, you will need to revisit this step.

    For the S3 service only, you must ensure that your Classic data plane network in your AWS account allows outgoing traffic to the AWS endpoints for the cloud services for S3 and also the FIPS variant of the S3 service with the prefix s3-fips. This applies to the S3 service but not to STS and Kinesis endpoints.

    • For S3, allow outgoing traffic to the endpoint s3.<region>.amazonaws.com and s3-fips.<region>.amazonaws.com. For example s3.us-east-1.amazonaws.com and s3-fips.us-east-1.amazonaws.com.

    • For STS, allow outgoing traffic to the endpoint sts.<region>.amazonaws.com.

    • For Kinesis, allow outgoing traffic to the endpoint kinesis.<region>.amazonaws.com.

  5. For every workspace that uses the profile, run the following tests to verify that the changes were correctly applied:

    1. Launch a Databricks cluster with 1 driver and 1 worker, any DBR version, and any instance type.

    2. Create a notebook attached to the cluster. Use this cluster for the following tests.

    3. In the notebook, validate DBFS connectivity by running:

      %fs ls /
      %sh ls /dbfs
      

      Confirm that a file listing appears without errors.

    4. In the notebook, confirm access to the control plane instance for your region. Get the address from the table this section and look for the Webapp endpoint for your VPC region.

      %sh nc -zv <webapp-domain-name> 443
      

      For example, for VPC region us-west-2:

      %sh nc -zv oregon.cloud.databricks.com 443
      

      Confirm the result says it succeeded.

    5. In the notebook, confirm access to the SCC relay for your region. Get the address from the table this section and look for the SCC relay endpoint for your VPC region.

      %sh nc -zv <scc-relay-domain-name> 2443
      

      For example, for VPC region us-west-1:

      %sh nc -zv tunnel.cloud.databricks.com 2443
      

      Confirm that the results says it succeeded.

    6. In the notebook, confirm access to the S3, STS, and Kinesis FIPS endpoints for your region.

      Note

      For this step, FIPS endpoints for Canada apply only to the S3 service. AWS does not yet provide FIPS endpoints for STS and Kinesis.

      %sh nc -zv <bucket-name>.s3-fips.<region>.amazonaws.com 443
      %sh nc -zv sts.<region>.amazonaws.com 443
      %sh nc -zv kinesis.<region>.amazonaws.com 443
      

      For example, for VPC region us-west-1:

      %sh nc -zv acme-company-bucket.s3-fips.us-west-1.amazonaws.com 443
      %sh nc -zv sts.us-west-1.amazonaws.com 443
      %sh nc -zv kinesis.us-west-1.amazonaws.com 443
      

      Confirm the results for all three commands indicate success.

    7. In the same notebook, validate that the cluster Spark config points to the desired endpoints. For example:

      >>> spark.conf.get("fs.s3a.stsAssumeRole.stsEndpoint")
      "sts.us-west-1.amazonaws.com"
      
      >>> spark.conf.get("fs.s3a.endpoint")
      "s3-fips.us-west-2.amazonaws.com"
      
  6. Confirm that all existing clusters and jobs in all affected workspaces use only the instance types that are supported by the compliance security profile. Confirm or change all clusters and jobs so that the instance types are one of the following: C5a, C5ad, C5n, C6i, C6id, C6in, D3, D3en, G4dn, G5, I3en, I4i, M5dn, M5n, M5zn, M6i, M6id, M6idn, M6in, P3dn, R-fleet, R5dn, R5n, R6i, R6id, R6idn, R6in, and Databricks fleet instance types M-fleet, MD-fleet, and RD-fleet..

    Any workload with an instance type outside of the list above would result in clusters/jobs failing to startup with an invalid_parameter_exception.

Features and technical controls

The main enhancements of a compliance security profile affect the Databricks compute resources in your AWS account, also known as the Classic data plane in your AWS account. These enhancements include:

  • An enhanced disk image (a CIS-hardened Ubuntu Advantage worker image).

  • Clusters automatically restart after 25 days and get the latest AMI with the latest security updates. If you enable the compliance security profile for your account or your workspace, long-running clusters are automatically restarted after 25 days. Databricks recommends that admins restart clusters that might be running for 25 days when the security profile is enabled and to do so during a scheduled maintenance window. This reduces the risk of an auto-restart disrupting a scheduled job. You can use a script that Databricks provides that can determine how long your clusters have been running, and optionally restart them. See Restart a cluster.

    Note

    If your workspace is part of the public preview of automatic cluster update, the behavior is different. Compute resources automatically restart only if updates are needed. You can choose a regular monthly or twice-monthly schedule, in which case the 25-day limit does not apply and the legacy workspace setting Automatic Restart of Long Running Clusters is ignored.

  • Security monitoring agents that generate logs that you can review. Two monitor agents run on compute resources (cluster workers) in your workspace’s Classic data plane in your AWS account. This applies to clusters for notebooks and jobs, as well disk images that are used for pro or classic SQL warehouses.

  • Enforced use of AWS Nitro instance types in cluster and Databricks SQL SQL warehouses. Instance types are limited to those that provide both hardware-implemented network encryption between cluster nodes and encryption at rest for local disks. This applies to clusters for notebooks and jobs as well as pro or classic SQL warehouses for use with Databricks SQL. The supported instance types are C5a, C5ad, C5n, C6i, C6id, C6in, D3, D3en, G4dn, G5, I3en, I4i, M5dn, M5n, M5zn, M6i, M6id, M6idn, M6in, P3dn, R-fleet, R5dn, R5n, R6i, R6id, R6idn, R6in, and Databricks fleet instance types M-fleet, MD-fleet, and RD-fleet..

  • Communications within the cluster and for egress use TLS 1.2 encryption or higher, including connecting to the metastore.

  • Clusters are limited to the versions that a compliance security profile supports. Databricks limits the Databricks Runtime versions in the UI, and does not allow API requests for unsupported Databricks Runtime versions. Supported versions are Databricks Runtime 7.3 LTS and above.

  • A shield logo appears in the navigation bar on the user icon in the lower-left of the page. See Confirm that the compliance security profile is enabled for a workspace.

The data plane enhancements that are discussed in this document apply only to the Classic data plane in your AWS account.

When a compliance security profile is enabled, Databricks does not allow use of serverless SQL warehouses, which run in the serverless data plane in the Databricks account.

Databricks runs two monitoring agents in the control plane in the Databricks AWS account:

  • Antivirus

  • File integrity monitoring

See Monitoring agents in Databricks compute images.

Disk image with enhanced hardening

While a compliance security profile is enabled, Databricks compute resources (cluster worker images) in your Classic data plane use an enhanced hardened operating system image based on Ubuntu Advantage. Ubuntu Advantage is a package of enterprise security and support for open source infrastructure and applications that includes the following:

Monitoring agents in Databricks compute images

While a compliance security profile is enabled, there are additional security monitoring agents, including two agents that are pre-installed in the images that are used for Databricks compute resource VMs. You cannot disable the monitoring agents that are in the enhanced disk image.

Monitoring agent

Description

How to get output

File integrity monitoring

Monitors for file integrity and security boundary violations. This monitor agent runs on the worker VM in your cluster.

Configure audit log delivery and review logs for new rows.

Antivirus and malware detection

Scans the filesystem for viruses including daily on-host virus scanning. This monitor agent runs on the VMs in your compute resources such as clusters and pro or classic SQL warehouses. The Antivirus and malware detection agent scans the entire host OS filesystem and the Databricks Runtime container filesystem. Anything outside the cluster VMs is outside of its scanning scope.

Configure audit log delivery and review logs for new rows.

Vulnerability scanning

Scans the container host (VM) for certain known vulnerabilities and CVEs. The scanning happens in representative images in the Databricks environments.

Request scan reports on the image from your Databricks representative.

File integrity monitoring

The data plane image includes a file integrity monitoring service that provides runtime visibility and threat detection for compute resources (cluster workers) in the classic data plane in your account.

The file integrity monitor output is generated within audit logs. To access these logs, an admin must set up audit log delivery to an Amazon S3 bucket. For the JSON schema for new auditable events that are specific to file integrity monitoring, see Audit log schemas for security monitoring.

Important

It is your responsibility to review file integrity monitor logs. At the sole discretion of Databricks, Databricks may review these logs but does not make a commitment to do so. If the agent detects a malicious activity, it is your responsibility to triage these events and open a support ticket with Databricks if the resolution or remediation requires an action by Databricks. Databricks may take action on the basis of these logs, including suspending or terminating the resources, but does not make any commitment to do so.

Antivirus and malware detection

The enhanced data plane image includes an antivirus engine for detecting trojans, viruses, malware, and other malicious threats. The antivirus monitor scans the entire host OS filesystem and the Databricks Runtime container filesystem. Anything outside the cluster VMs is outside of its scanning scope.

The antivirus monitor output is generated within audit logs. To access these logs, an admin must set up audit log delivery to an Amazon S3 bucket. For the JSON schema for new auditable events that are specific to antivirus monitoring, see Schema for antivirus monitoring.

Important

It is your responsibility to review antivirus monitor logs. At the sole discretion of Databricks, Databricks may review these logs but does not make a commitment to do so. If the agent detects a malicious activity, it is your responsibility to triage these events and open a support ticket with Databricks if the resolution or remediation requires an action by Databricks. Databricks may take action on the basis of these logs, including suspending or terminating the resources, but does not make any commitment to do so.

When a new AMI is built, updated signature files are included within the new AMI.

Vulnerability scanning

A vulnerability monitor agent performs vulnerability scans of the container host (VM) for certain known CVEs.

Important

The scanning happens in representative images in the Databricks environments.

You can request the vulnerability scan reports from your Databricks representative.

When vulnerabilities are found with this agent, Databricks tracks them against its Vulnerability Management SLA and releases an updated image when available. It is your responsibility to restart all compute resources regularly to keep the image up-to-date with the latest image version.

Management and upgrade of monitoring agents

The additional monitoring agents that are on the disk images used for the compute resources in the Classic data plane are part of the standard Databricks process for upgrading systems:

  • The Classic data plane base disk image (AMI) is owned, managed, and patched by Databricks.

  • Databricks delivers and applies security patches by releasing new disk images (AMIs). The delivery schedule depends on new functionality and the SLA for discovered vulnerabilities. Typical delivery is every 2-4 weeks.

  • The base operating system for the data plane is Ubuntu Advantage 18.04 LTS.

  • Databricks clusters and pro or classic SQL warehouses are ephemeral by default. Upon launch, clusters and pro or classic SQL warehouses use the latest available base image. Older versions that may have security vulnerabilities are unavailable for new clusters.

    • You are responsible for ensuring that you do not have long-running clusters.

    • You are responsible for restarting clusters (using the UI or API) regularly to ensure they use the latest patched host VM images.

    • Databricks can share upon request a Databricks notebook that lists your workspace’s running clusters and identifies hosts older than a specified number of days and optionally restart a cluster.

Monitor agent termination

If a monitor agent on the worker VM is found to be not running due to crash or other termination, the system will attempt to restart the agent.

Data retention policy for monitor agent data

Monitoring logs are sent to your own Amazon S3 bucket as part of audit log delivery. Retention, ingestion, and analysis of these logs is your responsibility.

Vulnerability scanning reports and logs are retained for at least one year by Databricks. You can request the vulnerability reports from your Databricks representative.

Confirm that the compliance security profile is enabled for a workspace

To confirm that a workspace is using the compliance security profile, check that it has the yellow shield logo displayed in the user interface. A shield logo appears in the navigation bar on the user icon in the lower-left of the page.

  • Initially when the navigation bar is collapsed the icon appears as Shield logo small..

    • If you hover the mouse over the icon and the navigation bar expands, the shield icon also appears along with a message: “<workspace-name> Compliance security profile”.

    Shield logo large.

Important

If the shield icons are missing for a workspace, contact your Databricks representative.

Check whether any existing clusters need to be restarted after enablement

After a workspace is enabled with the security profile, you need to restart any clusters that were created before the time of enablement to ensure it is using the security profile enhancements and controls.

Note

If your workspace is part of the public preview of automatic cluster update, you might not need this script. Clusters restart automatically if needed during the scheduled maintenance windows.

If you have many clusters running and only want to restart the ones that were started before enablement, you can use this script to determine if the start time was before the enablement date. Given a workspace URL, a personal access token for access REST APIs on this workspace, and an enablement date/time, this script returns a list of clusters that were started and/or restarted before the enablement timestamp. The script prints the cluster ID and the cluster name.

import requests
import json

# This notebook requires a user-level Personal Access Token. This should be stored
# in the Databricks Secrets API (or similar) and shouldn't be hardcoded in a notebook.
# Add a secret using the Databricks CLI or API. CLI example:
# $ databricks secrets create-scope --scope YOUR_SCOPE_NAME
# $ databricks secrets put --scope splunk_env --key YOUR_KEY_NAME
# Configure your scope and key name below.

#====== UPDATE THE FOLLOWING BELOW

WORKSPACE_URL="<WORKSPACE_URL_HERE>"
TOKEN=dbutils.secrets.get(scope="YOUR_SCOPE_NAME", key="YOUR_KEY_NAME")
# One of the below should be configured:
WORKSPACE_ENABLEMENT_TIME_UTC_MILLIS=<TIME_IN_UTC> # note millis, e.g. 1651366230000
WORKSPACE_ENABLEMENT_TIME_FORMATTED=None # Format YYYY-MM-DD HH:MM:SS -0000
                                         # Example "2022-06-01 15:01:01 -0700"
#====== UPDATE THE ABOVE

if WORKSPACE_ENABLEMENT_TIME_FORMATTED != None:
  WORKSPACE_ENABLEMENT_TIME_UTC_MILLIS=datetime.strptime(
              WORKSPACE_ENABLEMENT_TIME_FORMATTED,
              "%Y-%m-%d %H:%M:%S %z").timestamp()*1000
headers = {
  'Authorization': 'Bearer ' + TOKEN
}
url = WORKSPACE_URL + "/api/2.0/clusters/list"

response = requests.request("GET", url, headers=headers, data={})

clusters = json.loads(response.text)["clusters"]
need_restart = []
for c in clusters:
  start_time = c["start_time"]
  last_start = start_time
  if "last_restarted_time" in c:
    last_start = max(start_time, c["last_restarted_time"])
  if last_start <= WORKSPACE_ENABLEMENT_TIME_UTC_MILLIS:
    need_restart.append((c["cluster_id"], c["cluster_name"]))

if (len(need_restart) == 0):
  print("All clusters have been restarted since {}".format(WORKSPACE_ENABLEMENT_TIME_UTC_MILLIS))
else:
  print("The following clusters still need to be restarted to remain in compliance")
  for (id, name) in need_restart:
    print("Cluster {}, {}".format(id, name))