Skip to main content

PrivateLink concepts

This page provides a general overview of PrivateLink at Databricks. PrivateLink creates a private, secure connection between your Databricks resources and your AWS services and serverless resources, ensuring your network traffic isn't exposed to the public internet.

Databricks supports three types of PrivateLink connectivity:

  • Inbound (front-end): Secures connections from users to workspaces
  • Outbound (serverless): Secures connections from Databricks serverless compute to your customer resources
  • Classic (back-end): Secures connections from classic compute to the control plane

Private connectivity overview

PrivateLink enables secure, private connectivity from your AWS VPCs and on-premises networks to AWS services, ensuring that your traffic remains isolated from the public internet. This capability is designed to help organizations address security and compliance requirements by enabling end-to-end private networking and minimizing the risk of data exfiltration.

You can enable inbound (front-end), outbound (serverless), or classic compute (back-end) PrivateLink connections independently, or a combination of the three, depending on your security and compliance requirements. You can also enforce private connectivity for the workspace, causing Databricks to reject all public network connections automatically. This combined approach delivers comprehensive network isolation, reducing your attack surface and supporting compliance for sensitive workloads.

With PrivateLink, you can:

  • Block data access from unauthorized networks or the public internet when using the Databricks web application or APIs.
  • Lower the risk of data exfiltration by restricting network exposure to approved private endpoints only.

Use this guide to determine which implementation best fits your needs.

Consideration

Inbound (front-end) only

Outbound (serverless) only

Classic compute plane (back-end) only

Complete private isolation

Primary security goal

Only authorized individuals can access my Databricks resources.

Secure data access from serverless

Lock down classic compute plane

Maximum isolation (secure everything)

User connectivity

Private or public

Public (internet)

Public (internet)

Private only

Serverless data access

Public (internet)

Private (to customer resources)

Public (internet)

Private (to customer resources)

Cluster connectivity to control plane

Public (standard secure path)

Public (standard secure path)

Private (required)

Private (required)

Prerequisites

Enterprise plan, customer-managed VPC, SCC

Enterprise plan

Enterprise plan, customer-managed VPC, SCC

Enterprise plan, customer-managed VPC, SCC

Relative cost

Cost per endpoint and data transfer

Cost per endpoint and data processed

Cost per endpoint and data transfer

Can be a higher cost (all endpoints, including data transfer and processing)

Inbound connectivity (front-end)

Inbound PrivateLink secures the connection from users to the Databricks workspace. Traffic routes through a VPC interface endpoint in your transit VPC instead of public IPs. Inbound PrivateLink provides secure access to:

  • Databricks web application
  • REST API
  • Databricks Connect API

See Configure front-end PrivateLink.

Inbound private connectivity

Inbound private connectivity to performance-intensive services

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

Service-Direct PrivateLink provides private connectivity to performance-intensive services:

  • Zerobus Ingest
  • Lakebase Autoscaling

See Configure Service-Direct PrivateLink.

Outbound connectivity (serverless)

Outbound PrivateLink enables private connectivity from Databricks serverless compute resources to your AWS resources. Unlike inbound and classic compute plane PrivateLink, which secure connections to Databricks, outbound PrivateLink secures connections from serverless compute to your customer resources.

Serverless PrivateLink uses Network Connectivity Configurations (NCCs), which are account-level regional constructs that manage private endpoint creation at scale. NCCs can be attached to multiple workspaces in the same region.

Private connectivity to S3 buckets

Preview

Private connectivity to S3 buckets is in Public Preview. To join this preview, contact your Databricks account team.

Enables serverless compute to access in-region S3 buckets through AWS PrivateLink without traversing the public internet. Your data traffic remains entirely within the AWS network.

See Configure private connectivity to AWS S3 storage buckets.

Private connectivity to AWS S3 storage buckets.

Private connectivity to VPC resources

Enables serverless compute to access resources in your VPC, such as databases and internal services, through private endpoints.

See Configure private connectivity to resources in your VPC.

Private connectivity to resources in your VPC.

Key concepts for outbound connectivity

  • Network Connectivity Configuration (NCC): An account-level regional construct that manages private endpoints and controls how serverless compute accesses customer resources.
  • Private endpoint rules: Define the specific resources that serverless compute can access privately.
  • Workspace attachment model: NCCs can be attached to up to 50 workspaces in the same region.
  • Limits and quotas:
    • Up to 10 NCCs per region per account
    • 30 private endpoints per region for S3 (distributed across NCCs)
    • 100 private endpoints per region for VPC resources (distributed across NCCs)
    • Up to 50 workspaces per NCC

Classic compute plane private connectivity

Classic compute plane private connectivity PrivateLink secures the connection from Databricks clusters to the control plane. Clusters connect to the control plane for REST APIs and secure cluster connectivity relay.

Classic compute plane connectivity PrivateLink addresses:

  • Compliance requirements: Helps meet strict regulatory and corporate compliance mandates that require all internal cloud traffic to remain on a private network.
  • Network perimeter hardening: Implementing classic compute plane PrivateLink alongside VPC endpoints for AWS services (such as S3, STS, and Kinesis) allows you to remove internet gateways from your VPC. This reduces data exfiltration risks by ensuring that data-processing clusters have no path to unauthorized services or destinations on the public internet.

See Configure back-end PrivateLink.

Classic private connectivity

note

You can set up classic compute plane private connectivity independently; it doesn't require inbound or serverless connectivity.

Virtual private clouds for private connectivity

Private connectivity uses two distinct virtual private clouds (VPCs).

  • Transit VPC: This VPC functions as a central hub for user connectivity and contains the inbound VPC endpoint required for client access to workspaces. You can have more than one transit VPC.
  • Compute Plane VPC: This is a VPC you create to host your Databricks workspace and classic VPC endpoints.

In some deployments, a single VPC can serve both functions by reusing the same REST API/WebApp endpoint for inbound, outbound, and classic compute plane PrivateLink.

Subnet allocation and sizing

Plan your subnets in each VPC to support private connectivity and deployments.

  • Transit VPC subnets:

    • Inbound VPC endpoint subnet: Allocates IP addresses for inbound VPC endpoints.
  • Compute plane VPC subnets:

    • Workspace subnets: Databricks workspace deployment requires at least two subnets in separate Availability Zones. For sizing information related to workspace subnets, see Subnets.
    • VPC endpoint subnet: An additional subnet is recommended to host VPC endpoints for the private connectivity of classic compute plane.

Databricks VPC endpoints

Databricks uses two distinct types of VPC endpoints to privatize traffic. Understand their different roles to implement them correctly.

  • Workspace endpoint: This is the primary VPC endpoint for securing traffic to and from your workspace. It handles REST API calls for both inbound and classic compute plane PrivateLink.
  • SCC relay endpoint: This VPC endpoint is specifically for secure cluster connectivity (SCC) between the compute plane and control plane. Classic compute plane PrivateLink requires this endpoint. See What is secure cluster connectivity?.
tip

Within the same VPC, Databricks recommends at most one endpoint per endpoint type. Share one workspace endpoint across all workspace access and one SCC relay endpoint for all SCC relay access from that VPC.

For VPC endpoints, note the following:

  • Shared endpoints: VPC endpoints can be shared across multiple workspaces that use the same customer-managed VPC because they are VPC-level resources. A single set of VPC endpoints can serve all workspaces deployed in that VPC and region.
  • Region-specific: VPC endpoints are region-specific resources. Workspaces in different regions require separate VPC endpoint configurations.

Key considerations

Before you configure private connectivity, keep the following in mind:

  • Network ACL requirements: Databricks requires subnet-level network ACLs to add 0.0.0.0/0 to your allow list. To control egress traffic, use an egress firewall or proxy appliance to block most traffic but allow the URLs that Databricks needs to connect to. See Configure a firewall and outbound access.
  • Port 3306 metastore connectivity (legacy HMS only): This port is only applicable for legacy workspaces using the deprecated Hive Metastore. New workspaces have the legacy Hive Metastore disabled by default. For legacy workspaces, port 3306 doesn't use PrivateLink for connectivity to the control plane. For metastore connectivity between compute and control planes, traffic traverses a publicly routable network space using an encrypted connection. The publicly resolvable FQDNs for the Databricks-hosted RDS instances that house the HMS are available at RDS addresses for legacy Hive metastore.
  • Security group best practices: Follow the principle of least privilege by creating dedicated security groups for your VPC endpoints:
    • Compute cluster security group: Must allow outbound TCP traffic to the VPC Endpoint security group on the required ports (443, 3306, 6666, etc.). See Security groups
    • VPC Endpoint security group: Must allow inbound TCP traffic from the compute cluster security group on those same ports.