Best practices for security, compliance, and privacy
The Databricks Security Best Practices guide, including a downloadable PDF, is available on the Databricks Security & Trust Center. The sections in this article list the best practices from this guide along the principles of this pillar.
1. Manage identity and access using least privilege
Account setup and identity configuration
During deployment, configure Databricks account administration, SSO, and user provisioning to establish a secure foundation:
- Assign account admin roles to 2-3 trusted individuals only
- Configure SSO with OIDC or SAML for centralized authentication
- Enable SCIM provisioning to automate user and group synchronization from your identity provider
- Set up identity federation to link corporate identities across workspaces
- Configure multifactor authentication at the identity provider level
- Define emergency access procedures for account recovery
For step-by-step account setup procedures, see Phase 1: Design account and identity strategy.
Identity and access management best practices
- Authenticate via single sign-on (SSO) at the account level
- Leverage multi-factor authentication
- Enable unified login and configure emergency access
- Use SCIM to synchronize users and groups
- Limit the number of admin users
- Enforce segregation of duties between administrative accounts
- Restrict workspace admins
- Manage access according to the principle of least privilege
- Use OAuth token authentication
- Enforce token management
- Restrict cluster creation rights
- Use compute policies
- Use service principals to run administrative tasks and production workloads
- Use compute that supports user isolation
- Store and use secrets securely
- Use a restricted cross-account IAM role
Details are in the PDF referenced at the beginning of this article.
2. Protect data in transit and at rest
- Centralise data governance with Unity Catalog
- Plan your data isolation model
- Avoid storing production data in DBFS
- Encrypt your S3 buckets and prevent public access
- Apply bucket policies
- Use S3 versioning
- Backup your S3 data
- Configure customer-managed keys for managed services
- Configure customer-managed keys for storage
- Use Delta Sharing
- Configure a Delta Sharing recipient token lifetime
- Additionally encrypt sensitive data at rest using Advanced Encryption Standard (AES)
- Leverage data exfiltration prevention settings within the workspace
- Use Clean Rooms to collaborate in a privacy-safe environment
Details are in the PDF referenced at the beginning of this article.
3. Secure your network and protect endpoints
Network deployment considerations for AWS
Deploy secure network infrastructure for Databricks workspaces on AWS. The following steps establish secure connectivity:
- Create a VPC with minimum /18 CIDR block for workspace deployments
- Provision private subnets in multiple Availability Zones for high availability
- Configure NAT gateway for outbound internet access from private subnets
- Set up security groups to control traffic to and from Databricks clusters
- Deploy AWS PrivateLink for private connectivity to Databricks control plane
- Enable Secure Cluster Connectivity (SCC) to eliminate inbound open ports
- Configure VPN or Direct Connect for on-premises connectivity (if required)
- Implement network segmentation to isolate production and non-production environments
For step-by-step AWS network configuration, see AWS network architecture.
Network security best practices
- Use a customer-managed VPC
- Configure IP access lists
- Use AWS PrivateLink
- Implement network exfiltration protections
- Isolate sensitive workloads into different networks
- Configure a firewall for serverless compute access
- Restrict access to valuable codebases to only trusted networks
Details are in the PDF referenced at the beginning of this article.
4. Meet compliance and data privacy requirements
- Restart compute on a regular schedule
- Isolate sensitive workloads into different workspaces
- Assign Unity Catalog securables to specific workspaces
- Implement fine-grained access controls
- Apply tags
- Use lineage
- Use AWS Nitro instances
- Use Enhanced Security Monitoring or Compliance Security Profile
- Control & monitor workspace access for Databricks personnel
- Implement and test a Disaster Recovery strategy
Details are in the PDF referenced at the beginning of this article.
5. Monitor system security
- Leverage system tables
- Monitor system activities via AWS CloudTrail and other logs
- Enable verbose audit logging
- Manage code versions with Git folders
- Restrict usage to trusted code repositories
- Provision infrastructure via infrastructure-as-code
- Manage code via CI/CD
- Control library installation
- Use models and data from only trusted or reputable sources
- Implement DevSecOps processes
- Use data quality monitoring
- Use inference tables and AI Guardrails
- Use tagging as part of your cost monitoring and charge-back strategy
- Use budgets to monitor account spending
- Use AWS service quotas
Details are in the PDF referenced at the beginning of this article.
Additional Resources
- Review the Security and Trust Center to understand is how security built into every layer of the Databricks Data Intelligence Platform, and the shared responsibility model we operate under.
- Download and review the Databricks AI Security Framework (DASF) to understand how to mitigate AI security threats based on real-world attack scenarios