Skip to main content

Data exfiltration protection

Data exfiltration protection is a defense-in-depth approach that combines network controls with data governance controls. It applies across all three network security architectures. This page describes how to combine network-level controls and Unity Catalog controls to prevent unauthorized data transfer in Databricks deployments.

For end-to-end reference architectures that implement these controls, see Data exfiltration protection architecture.

What is data exfiltration protection?

Data exfiltration is the unauthorized transfer of sensitive data out of your Databricks environment. With data exfiltration protection you can avoid exploitation of open network paths, misconfigured storage, overly permissive egress rules, or compromised credentials. You can also prevent users with legitimate access from downloading query results or writing to an unapproved external destination.

Network controls close off the unauthorized network paths; Unity Catalog controls govern what authorized users and compute can do with the data they're permitted to reach. You need both.

Network controls:

  • Network isolation: Deploy workloads in private networks with no public internet access.
  • Private connectivity: Use PrivateLink to access cloud services without internet exposure.
  • Egress control: Control outbound access using firewall or proxy-based controls.
  • Storage access policies: Restrict which storage accounts and services workloads can reach.

Unity Catalog controls:

  • Standard access control: GRANT and REVOKE permissions on catalogs, schemas, tables, and volumes.
  • Attribute-based access control (ABAC): Govern data access based on attributes (tags) attached to data objects, not just object identity.
  • Row filters and column masks: Apply row-level and column-level security to restrict what users see within a table.
  • Workspace catalog bindings: Isolate which workspaces can access which data.
  • Audit logging: Capture all data access for monitoring and compliance.

How it relates to each network architecture

The depth of network controls scales with the architecture you choose. Unity Catalog controls apply identically across all three architectures and govern what authorized users and compute can do with data and do not change based on your network posture.

Architecture

Network controls

Managed security

Customer-managed VPC, SCC, backend classic compute plane PrivateLink

Hardened connectivity

Adds context-based ingress, VPC endpoints, serverless egress controls, and optional firewall

Isolated environment

Adds inbound PrivateLink and required firewall for full private connectivity

Network controls alone don't prevent authorized users from misusing access. Combine them with Unity Catalog controls for complete data exfiltration protection.

When to implement

Implement data exfiltration protection when:

  • Handling highly sensitive or regulated data (financial, healthcare, government).
  • Compliance frameworks mandate egress controls (for example, SOC 2, HIPAA, PCI DSS, and FedRAMP).
  • Your organization requires complete visibility into data movement.
  • Industry regulations prohibit data transfer to specific regions or services.
important

Data exfiltration protection requires multiple security layers working together: both network controls and data governance controls. No single layer is sufficient on its own.

Security layers

Data exfiltration protection combines multiple security mechanisms. The following table summarizes each layer and its AWS implementation:

Security layer

Purpose

Implementation

Priority

Network isolation

Eliminate public access

Customer-managed VPC, SCC

High

Private connectivity

Secure cloud service access

PrivateLink, VPC endpoints

High

Egress inspection

Monitor outbound traffic

Third-party firewall appliance (such as Palo Alto) integrated with Gateway Load Balancer

High

Serverless controls

Govern serverless egress

Network policies

High

Data governance

Access control and audit

Unity Catalog

High

For the full reference architectures that implement these layers on AWS and Azure, see Data exfiltration protection architecture.

Cost considerations

Data exfiltration protection has higher networking costs than standard deployments due to the additional infrastructure required for private connectivity and traffic inspection.

Cost factor

Description

PrivateLink

Data transfer charges per GB through interface VPC endpoints for the Databricks control plane and SCC relay.

VPC interface endpoints

Hourly endpoint charges for STS, Kinesis, and any non-gateway services.

S3 gateway endpoints

No charge for the endpoint itself.

External firewall

AWS Network Firewall (per-endpoint and per-GB processing) or third-party appliance licensing and EC2/GLB compute.

Data transfer

Additional charges for traffic routed through the firewall, NAT gateway, or cross-AZ paths.