HIPAA-compliant deployment

Databricks supports HIPAA-compliant deployment to process PHI data, as long as you have a mutually signed order form and Business Associate Agreement (BAA) in place with Databricks prior to processing PHI. Follow the steps in this article to set up your deployment in a HIPAA-compliant manner.

Sign a Business Associate Agreement (BAA) with AWS

Contact your account manager or send an email to sales@databricks.com and sign a Business Associate Agreement (BAA) with both Databricks and AWS to maintain compliance with HIPAA regulations. This agreement is required under HIPAA to permit you to process PHI within Databricks.

Create and verify a HIPAA-compliant cluster

These steps describe how to create a HIPAA-compliant cluster to process PHI data.

Step 1: Create a cluster

Follow the instructions in Create a cluster. As part of the configuration step you must choose a Databricks runtime.


Databricks Runtime for Machine Learning includes high-performance distributed machine learning packages that use MPI (Message Passing Interface) and other low-level communication protocols. Because these protocols do not natively support encryption over the wire, these ML packages can potentially send unencrypted sensitive data across the network. These packages do not change data encryption over the wire if your workflow does not depend on them.

What are the risks?

Messages sent across the network by these ML packages are typically either ML model parameters or summary statistics about training data. It is therefore not typically expected that sensitive data, such as protected health information, would be sent over the wire unencrypted. However, it is possible that certain configurations or uses of these packages (such as specific model designs) could result in messages being sent across the network that contain such information.

Which packages are affected?

Step 2: Configure the cluster with an EBS volume

Provision an EBS volume, as Databricks EBS volumes are encrypted while the default local storage is not.

Provision EBS volume

Step 3: Verify that encryption is enabled

  1. Create a notebook in the workspace and attach the notebook to the cluster that was created in the previous step.

  2. Run the following command in the notebook:

    %scala spark.conf.get("spark.ssl.enabled")

    If the returned value is true, you have successfully created a cluster with encryption turned on. If not, contact help@databricks.com.


spark-submit is not supported on HIPAA-compliant clusters.