HIPAA
This page describes HIPAA compliance controls in Databricks.
HIPAA overview
HIPAA is a US healthcare law that establishes national standards for protecting the privacy and security of protected health information (PHI).
Key points:
- Applies to healthcare providers, insurers, and vendors that handle PHI.
- Includes rules for privacy, security, and breach notification.
- Requires administrative, technical, and physical safeguards for PHI.
- Applies to cloud service providers that store or process PHI.
Business Associate Agreement (BAA) requirement for processing PHI
HIPAA and related regulations require organizations that handle protected health information (PHI) to meet specific safeguards. When a covered entity or business associate uses a cloud service provider (CSP) like Databricks, the CSP is also considered a business associate.
As a result, Databricks permits the processing of PHI data only if you have an active Business Associate Agreement (BAA) with Databricks. You must have this agreement in place before processing any PHI data. Contact your Databricks account team for more information.
Enable HIPAA compliance controls
HIPAA compliance features are enabled at the account level. Contact your Databricks account team to upgrade your account to include HIPAA compliance features. Note that enabling HIPAA compliance features for an account is permanent.
After your Databricks account is enabled for HIPAA, all workspaces in the account have HIPAA compliance features. To deploy a workspace without HIPAA compliance features, you must create a separate Databricks account.
Databricks relies on the built-in features of GCE to enforce encryption at rest and encryption in transit within a cluster.
Shared responsibility of HIPAA compliance
Complying with HIPAA has three major areas, with different responsibilities. While each party has numerous responsibilities, below we enumerate key responsibilities of Databricks, along with your responsibilities.
This section use the terminology control plane and compute plane, which are two main parts of Databricks architecture:
- The Databricks control plane includes the backend services that Databricks manages in its own Google Cloud account.
- The compute plane is where your data lake is processed. The classic compute plane includes a VPC in your Google Cloud account, and clusters of compute resources to process your notebooks, jobs, and pro or classic SQL warehouses.
For more information, see Databricks architecture overview.
Ensure that sensitive information is never entered in customer-defined input fields, such as workspace names, cluster names, and job names.
- You are wholly responsible for ensuring your own compliance with all applicable laws and regulations. Information provided in Databricks online documentation does not constitute legal advice, and you should consult your legal advisor for any questions regarding regulatory compliance.
- Databricks does not support the use of preview features for the processing of PHI on the HIPAA on Google Cloud platform, with the exception of the features listed in Supported preview features.
Key responsibilities of Google include:
- Perform its obligations as a business associate under your BAA with Google.
- Provide you virtual machines under your contract with Google Cloud that support HIPAA compliance.
- Provide encryption at rest and in-transit encryption within GCE clusters that is adequate under HIPAA.
- Delete encryption keys and data when Databricks releases the VM instances.
Key responsibilities of Databricks include:
- Encrypt in-transit PHI data that is transmitted to or from the control plane.
- Encrypt PHI data at rest in the control plane
- Deprovision VM instances when you indicate in Databricks (for example, through auto-termination or manual termination) so Azure can wipe them.
Key responsibilities of yours:
-
Do not use preview features within Databricks to process PHI without our written permission. However, it is supported to use the preview features listed in Supported preview features.
-
Follow security best practices, such as disable unnecessary egress from the compute plane and use the Databricks secrets feature (or other similar functionality) to store access keys that provide access to PHI.
-
Enter into a business associate agreement with Google Cloud to cover all data processed within the VPC where the VM instances are deployed.
-
Do not do something within a virtual machine that would be a violation of HIPAA. For example, direct Databricks to send unencrypted PHI to an endpoint.
-
Ensure that all data that may contain PHI is encrypted at rest when you store it in locations that the Databricks platform may interact with. You are responsible for ensuring the encryption (as well as performing backups) for your buckets that Databricks creates in your account for each workspace and all other data sources.
-
Ensure that all data that may contain PHI is encrypted in transit between Databricks and any of your data storage locations or external locations you access from a compute plane machine. For example, any APIs that you use in a notebook that might connect to external data source must use appropriate encryption on any outgoing connections.
-
Ensure the encryption (as well as performing backups) for your workspace's buckets and all other data sources.
-
Ensure that all data that may contain PHI is encrypted in transit between Databricks and any of your data storage locations or external locations you access from a compute plane machine. For example, any APIs that you use in a notebook that might connect to external data source must use appropriate encryption on any outgoing connections.