Databricks supports HIPAA-compliant deployment to process PHI data, as long as you have a mutually signed order form and Business Associate Agreement (BAA) in place with Databricks prior to processing PHI. Follow the steps in this article to ensure that your deployment is HIPAA compliant.
Contact your account manager or send an email to email@example.com and sign a Business Associate Agreement (BAA) with both Databricks and AWS to maintain compliance with HIPAA regulations. This agreement is required under HIPAA to permit you to process PHI within Databricks.
These steps describe how to create a HIPAA-compliant cluster to process PHI data.
Databricks Runtime for Machine Learning includes high-performance distributed machine learning packages that use MPI (Message Passing Interface) and other low-level communication protocols. Because these protocols do not natively support encryption over the wire, these ML packages can potentially send unencrypted sensitive data across the network. These packages do not change data encryption over the wire if your workflow does not depend on them.
Messages sent across the network by these ML packages are typically either ML model parameters or summary statistics about training data. It is therefore not typically expected that sensitive data, such as protected health information, would be sent over the wire unencrypted. However, it is possible that certain configurations or uses of these packages (such as specific model designs) could result in messages being sent across the network that contain such information.
Provision an EBS volume, as Databricks EBS volumes are encrypted while the default local storage is not.
Create a notebook in the workspace and attach the notebook to the cluster that was created in the previous step.
Run the following command in the notebook:
If the returned value is true, you have successfully created a cluster with encryption turned on. If not, contact firstname.lastname@example.org.