HIPAA
This page describes HIPAA compliance controls in Databricks.
HIPAA overview
HIPAA is a US healthcare law that establishes national standards for protecting the privacy and security of protected health information (PHI).
Key points:
- Applies to healthcare providers, insurers, and vendors that handle PHI.
- Includes rules for privacy, security, and breach notification.
- Requires administrative, technical, and physical safeguards for PHI.
- Applies to cloud service providers that store or process PHI.
Business Associate Agreement (BAA) requirement for processing PHI
HIPAA and related regulations require organizations that handle protected health information (PHI) to meet specific safeguards. When a covered entity or business associate uses a cloud service provider (CSP) like Databricks, the CSP is also considered a business associate.
As a result, Databricks permits the processing of PHI data only if you have an active Business Associate Agreement (BAA) with Databricks. You must have this agreement in place before processing any PHI data. Contact your Databricks account team for more information.
Enable HIPAA compliance controls
HIPAA compliance controls require enabling the compliance security profile, which adds monitoring agents, provides a hardened compute image, and more. Only specific preview features are supported for processing regulated data. For details on the compliance security profile and supported preview features, see Compliance security profile.
To enable HIPAA compliance controls, see Configure enhanced security and compliance settings.
It is your responsibility to confirm that each workspace has the compliance security profile enabled. You must also have an active BAA agreement with Databricks before processing any PHI data.
Shared responsibility of HIPAA compliance
Complying with HIPAA has three major areas, with different responsibilities. While each party has numerous responsibilities, below we enumerate key responsibilities of Databricks, along with your responsibilities.
This section use the terminology control plane and compute plane, which are two main parts of Databricks architecture:
- The Databricks control plane includes the backend services that Databricks manages in its own AWS account.
- The compute plane is where your data lake is processed. The classic compute plane includes a VPC in your AWS account, and clusters of compute resources to process your notebooks, jobs, and pro or classic SQL warehouses.
For more information, see Databricks architecture overview.
Ensure that sensitive information is never entered in customer-defined input fields, such as workspace names, cluster names, and job names.
- You are wholly responsible for ensuring your own compliance with all applicable laws and regulations. Information provided in Databricks online documentation does not constitute legal advice, and you should consult your legal advisor for any questions regarding regulatory compliance.
- Databricks does not support the use of preview features for the processing of PHI on the HIPAA on AWS platform, with the exception of the features listed in Supported preview features.
Key responsibilities of AWS include:
- Perform its obligations as a business associate under your BAA with AWS.
- Provide you the EC2 machines under your contract with AWS that support HIPAA compliance.
- Provide hardware-accelerated encryption at rest and in-transit encryption within the AWS Nitro Instances that is adequate under HIPAA.
- Delete encryption keys and data when Databricks releases the EC2 instances.
Key responsibilities of Databricks include:
- Encrypt in-transit PHI data sent to or from the control plane.
- Encrypt PHI data at rest in the control plane.
- Use only AWS Nitro instance types, which enforce encryption in transit and at rest. Databricks enforces this in both the workspace and API.
- Deprovision EC2 instances when indicated in Databricks (for example, through auto-termination or manual termination) so AWS can wipe them.
Key responsibilities of yours:
- Configure your workspace to use either customer-managed keys for managed services or the Store interactive notebook results in customer account feature.
- Do not use preview features in Databricks to process PHI, except those listed in Supported preview features.
- Follow security best practices, such as disabling unnecessary egress from the compute plane and using Databricks secrets to store access keys for PHI.
- Enter into a business associate agreement with AWS to cover all data processed within the VPC where EC2 instances are deployed.
- Do not perform actions within a virtual machine that would violate HIPAA. For example, do not direct Databricks to send unencrypted PHI to an endpoint.
- Ensure all data that may contain PHI is encrypted at rest in any storage location the Databricks platform interacts with. This includes setting encryption on workspace storage accounts during workspace creation. You are responsible for encryption and backups of this storage and all other data sources.
- Ensure all data that may contain PHI is encrypted in transit between Databricks and any connected data storage or external systems. For example, APIs used in notebooks must use encryption for all outbound connections.