Data exfiltration protection
Data exfiltration protection is a defense-in-depth approach that combines network controls with data governance controls. It applies across all three network security architectures. This page describes how to combine network-level controls and Unity Catalog controls to prevent unauthorized data transfer in Databricks deployments.
For end-to-end reference architectures that implement these controls, see Data exfiltration protection architecture.
What is data exfiltration protection?
Data exfiltration is the unauthorized transfer of sensitive data out of your Databricks environment. With data exfiltration protection you can avoid exploitation of open network paths, misconfigured storage, overly permissive egress rules, or compromised credentials. You can also prevent users with legitimate access from downloading query results or writing to an unapproved external destination.
Network controls close off the unauthorized network paths; Unity Catalog controls govern what authorized users and compute can do with the data they're permitted to reach. You need both.
Network controls:
- Network isolation: Deploy workloads in private networks with no public internet access.
- Private connectivity: Use PrivateLink to access cloud services without internet exposure.
- Egress control: Control outbound access using firewall or proxy-based controls.
- Storage access policies: Restrict which storage accounts and services workloads can reach.
Unity Catalog controls:
- Standard access control:
GRANTandREVOKEpermissions on catalogs, schemas, tables, and volumes. - Attribute-based access control (ABAC): Govern data access based on attributes (tags) attached to data objects, not just object identity.
- Row filters and column masks: Apply row-level and column-level security to restrict what users see within a table.
- Workspace catalog bindings: Isolate which workspaces can access which data.
- Audit logging: Capture all data access for monitoring and compliance.
How it relates to each network architecture
The depth of network controls scales with the architecture you choose. Unity Catalog controls apply identically across all three architectures and govern what authorized users and compute can do with data and do not change based on your network posture.
Architecture | Network controls |
|---|---|
Customer-managed VPC, SCC, backend classic compute plane PrivateLink | |
Adds context-based ingress, VPC endpoints, serverless egress controls, and optional firewall | |
Adds inbound PrivateLink and required firewall for full private connectivity |
Network controls alone don't prevent authorized users from misusing access. Combine them with Unity Catalog controls for complete data exfiltration protection.
When to implement
Implement data exfiltration protection when:
- Handling highly sensitive or regulated data (financial, healthcare, government).
- Compliance frameworks mandate egress controls (for example, SOC 2, HIPAA, PCI DSS, and FedRAMP).
- Your organization requires complete visibility into data movement.
- Industry regulations prohibit data transfer to specific regions or services.
Data exfiltration protection requires multiple security layers working together: both network controls and data governance controls. No single layer is sufficient on its own.
Security layers
Data exfiltration protection combines multiple security mechanisms. The following table summarizes each layer and its AWS implementation:
Security layer | Purpose | Implementation | Priority |
|---|---|---|---|
Network isolation | Eliminate public access | Customer-managed VPC, SCC | High |
Private connectivity | Secure cloud service access | PrivateLink, VPC endpoints | High |
Egress inspection | Monitor outbound traffic | Third-party firewall appliance (such as Palo Alto) integrated with Gateway Load Balancer | High |
Serverless controls | Govern serverless egress | Network policies | High |
Data governance | Access control and audit | Unity Catalog | High |
For the full reference architectures that implement these layers on AWS and Azure, see Data exfiltration protection architecture.
Cost considerations
Data exfiltration protection has higher networking costs than standard deployments due to the additional infrastructure required for private connectivity and traffic inspection.
Cost factor | Description |
|---|---|
PrivateLink | Data transfer charges per GB through interface VPC endpoints for the Databricks control plane and SCC relay. |
VPC interface endpoints | Hourly endpoint charges for STS, Kinesis, and any non-gateway services. |
S3 gateway endpoints | No charge for the endpoint itself. |
External firewall | AWS Network Firewall (per-endpoint and per-GB processing) or third-party appliance licensing and EC2/GLB compute. |
Data transfer | Additional charges for traffic routed through the firewall, NAT gateway, or cross-AZ paths. |