Skip to main content

Configure classic private connectivity to Databricks

This page provides configuration steps to enable classic compute plane (back-end) private connectivity for AWS PrivateLink.

Configuring a classic compute plane PrivateLink connection provides critical security and compliance benefits for your data processing environment:

  • Enhanced security: Prevents your Databricks clusters from communicating with the control plane over the public internet, isolating your data workloads from public networks.
  • Compliance requirements: Helps meet strict regulatory and corporate compliance mandates that require all internal cloud traffic to remain on a private network.
  • Data exfiltration control: By securing the connection from the compute plane that actively processes data, you add a powerful layer of protection against data exfiltration.

Architecture overview

Classic compute plane PrivateLink (classic compute to control plane) connects Databricks classic compute resources in a customer VPC to workspace core services. Classic compute connects to the control plane for Databricks REST APIs and secure cluster connectivity relay.

See Configure classic private connectivity to Databricks.

Requirements

  • Your Databricks account is on the Enterprise pricing tier.
  • Your Databricks workspace must use a customer-managed VPC. You can't convert an existing workspace from a Databricks-managed VPC to a customer-managed VPC. See Configure a customer-managed VPC.
  • Your Databricks workspace must use secure cluster connectivity. To add classic compute plane PrivateLink to an older workspace that doesn't use secure cluster connectivity, contact your Databricks account team.
  • You must have all necessary AWS permissions to set up a Databricks workspace and to create new VPC endpoints for your workspace.

Best practices

Databricks recommends the following for a resilient and manageable setup:

  • Share VPC endpoints: You can share classic VPC endpoints across multiple workspaces that use the same customer-managed VPC because they are VPC-level resources.
  • Separate subnet for VPC endpoints: Create a dedicated subnet for VPC endpoints to follow the principle of least privilege and simplify network management.
  • Separate security groups: Use distinct security groups for classic compute and VPC endpoints to enforce granular access controls.
  • Plan network sizing carefully: Verify that your VPC and subnets have adequate IP address space. See Configure a customer-managed VPC.

Step 1: Configure AWS network objects

You can use the AWS Management Console to create these objects or automate the process with tools such as the Terraform provider for networks.

Configure VPC settings

  1. If you haven't already, set up a VPC for your workspace. You can reuse a VPC from another workspace. To create a VPC, see Configure a customer-managed VPC. If updating a workspace for PrivateLink, verify that it already uses a customer-managed VPC.
  2. Verify that your VPC has both DNS Hostnames and DNS resolution enabled.
  3. Select an IPv4 CIDR block for your VPC with a netmask of at least /25.

Configure network ACLs

Databricks requires subnet-level network ACLs to add 0.0.0.0/0 to your allow list. To control egress traffic, use an egress firewall or proxy appliance to block most traffic but allow the URLs that Databricks needs to connect to. See Configure a firewall and outbound access.

  1. Create and configure an extra VPC subnet (optional):
    • Create a dedicated subnet for your VPC endpoints, including compute plane PrivateLink VPC endpoints and also any optional VPC endpoints to other AWS services. This subnet is the one you select when creating VPC endpoints, ensuring they are isolated from your workspace subnets while maintaining network connectivity.
    • Attach a separate route table to your VPC endpoints subnet, distinct from the route table for your workspace subnets. This route table should have only a single default route for the local VPC.

Create security groups

To ensure secure connectivity, Databricks recommends using two primary security groups rather than individual groups for each endpoint.

  • Workspace security group: Applied to the workspace resources.

  • PrivateLink endpoint security group: Applied to all VPC endpoints. This group requires inbound rules on the specified ports, using the workspace security group as the source. Outbound rules are not required for the endpoint security group.

  • Workspace VPC endpoint security group: See Security groups

If your workspace uses the compliance security profile, you must also allow bidirectional (outbound and inbound) access to port 2443 to support FIPS endpoints for the secure cluster connectivity relay.

note

Each security group must allow bidirectional (inbound and outbound) access between the workspace subnets and the VPC endpoints subnet. However, using inbound rules only is more restrictive and still sufficient to support PrivateLink communication. This stricter configuration is the one implemented in the Security Reference Architectures (SRA) Terraform templates.

Step 2: Create VPC endpoints

For classic compute plane PrivateLink, create VPC endpoints for the secure cluster connectivity relay and for the workspace, enabling compute plane calls to Databricks REST APIs. For guidance on managing VPC endpoints with the AWS Management Console, see the AWS article Create VPC endpoints in the AWS Management Console. You can share classic VPC endpoints across multiple workspaces that use the same customer-managed VPC.

Create the workspace VPC endpoint

To create classic VPC endpoints in the AWS Management Console:

  1. Go to the VPC endpoints section of the AWS Management Console.

  2. In the upper right, set the region to the same region as your workspace.

  3. Click Create Endpoint.

  4. Name the endpoint, incorporating the region and the word workspace, such as databricks-us-west-2-workspace-vpce, for the workspace VPC endpoint.

  5. Under Service Category, select Endpoint services that use Network Load Balancers (NLBs) and Gateway Load Balancers(GWLBs).

  6. In the service name field, paste in the service name. Get your region's VPC endpoint service domains from the table in PrivateLink VPC endpoint services.

    For the first VPC endpoint that you create, copy the regional service name for the workspace.

  7. Click Verify service and verify that the page shows Service name verified in a green box. If you encounter an error stating "Service name could not be verified", check that the regions of your VPC, subnets, and new VPC endpoint match.

  8. In the VPC field, select your workspace VPC.

  9. In the Subnets section, select the subnet where the VPC endpoint will be created. If you created a separate dedicated subnet for VPC endpoints, select that subnet. Otherwise, select one of your Databricks workspace subnets.

  10. In the Security groups section, select the security group you created for classic connections in Step 1: Configure AWS network objects.

  11. Under Additional settings, turn on the Enable private DNS name option.

  12. Click Create endpoint.

Create the SCC relay VPC endpoint

  1. Repeat the previous steps to create the secure cluster connectivity relay endpoint. Use the table in PrivateLink VPC endpoint services to get the regional service name for the secure cluster connectivity relay. Databricks recommends including the region and the word scc in the endpoint name, such as databricks-us-west-2-scc-vpce.
note

Classic VPC endpoints automatically use AWS DNS resolution when you enable the Enable private DNS name option on the endpoint. If you're also configuring inbound PrivateLink, you need to set up DNS to route user requests to the inbound VPC endpoint. For comprehensive DNS configuration guidance, see Configure DNS for AWS inbound PrivateLink.

Several types of objects are relevant for PrivateLink configuration:

  • VPC endpoint registrations: After creating VPC endpoints in the AWS Management Console, register them with Databricks to establish VPC endpoint registrations. VPC endpoint registrations can't be updated later.
    • For classic VPC endpoints, verify that the region field matches your workspace region and the region of the AWS VPC endpoints you're registering. For inbound PrivateLink, the region field must match your transit VPC region and the region of the AWS VPC endpoint for the workspace's inbound connection.
    • To register your classic and inbound VPC endpoints, follow the instructions in Manage VPC endpoint registrations.
  • Network configurations (only required for classic VPC endpoints): Network configurations detail information about a customer-managed VPC and include two classic compute plane PrivateLink configuration fields.
    • To create a network configuration, see Register your VPC with Databricks. For comprehensive requirements for customer-managed VPCs, subnets, and security groups, see Configure a customer-managed VPC. In the Back-end private connectivity section, set the fields to your classic VPC endpoint registrations as follows:
    • In the first field, select the VPC endpoint registration for the secure cluster connectivity relay.
    • In the second field, choose the VPC endpoint registration for the workspace (REST APIs).
    • After you create a network configuration, you can't update it.
  • Private access configurations: A workspace's private access configuration object includes settings for AWS PrivateLink connectivity. You can use a single private access settings object for multiple workspaces in the same AWS region. To create a private access settings (PAS) object, see Manage private access settings.

Your workspace must already be using a customer-managed VPC and secure cluster connectivity.

  1. Create a workspace using the instructions here. This page explains how to configure key workspace settings, including the workspace URL, region, Unity Catalog integration, credential configurations, and storage configurations. Don't click the Save button yet.
  2. Click Advanced configurations to view additional fields.
  3. Under Virtual Private Cloud, in the menu choose the Databricks network configuration you created.
  4. Below the Private Link heading, click the menu, and choose the name of the private access settings object you created.
  5. Click Save.
note

After you create a workspace, its status changes to RUNNING, and associated VPC network updates are immediately applied. However, wait an additional 20 minutes after the status shows RUNNING before creating or using clusters. Attempting to create or use clusters before this time could result in launch failures, errors, or other unexpected behavior.

Step 5: Add VPC endpoints for other AWS services

When implementing classic compute plane PrivateLink, you must choose one of two approaches for cluster connectivity to AWS services (S3, STS, Kinesis):

  • Standard configuration (Option 1): Requires outbound internet access using a NAT gateway and internet gateway (or similar customer-managed infrastructure) along with optional S3, STS, and Kinesis VPC endpoints.
  • Fully-private configuration (Option 2): Eliminates NAT gateway and internet gateway by requiring S3, STS, and Kinesis VPC endpoints.

For typical use cases, Databricks recommends creating the following VPC endpoints. This allows clusters and other compute resources in the classic compute plane to connect directly to AWS native services over AWS PrivateLink. Create these VPC endpoints in the same subnet as your classic VPC endpoints.

Use cases where clusters don't have network access to the public AWS endpoints require these VPC endpoints:

Option 2: Strict air-gap deployment

If your deployment requires strict air-gap environments where compute can't access the public internet, configure additional VPC endpoints to replace NAT Gateway and Internet Gateway connectivity.

This configuration eliminates the NAT Gateway and Internet Gateway. Instead, create the following resources:

  1. Create a private subnet for AWS service interface endpoints with a minimum /27 CIDR range.
  2. Create a security group for the AWS VPC endpoints with an inbound rule allowing TCP 443 from the workspace security group.
  3. Create an STS VPC interface endpoint with the following configuration:
    • Service name: com.amazonaws.<region>.sts
    • Subnet: AWS endpoints subnet
    • Security group: AWS endpoints security group
    • Enable private DNS name
    • Naming convention: sts-<region>-vpce (for example, sts-us-west-2-vpce)
  4. Create a Kinesis VPC interface endpoint with the following configuration:
    • Service name: com.amazonaws.<region>.kinesis-streams
    • Subnet: AWS endpoints subnet
    • Security group: AWS endpoints security group
    • Enable private DNS name
    • Naming convention: kinesis-<region>-vpce (for example, kinesis-us-west-2-vpce)
  5. Create an S3 VPC gateway endpoint. This endpoint is required and configured the same as in the standard configuration.

Centralized endpoint configuration

To centralize your endpoints, verify the following:

  • Compute resources resolve the fully qualified domain name for each service to the private IP of the corresponding VPC endpoint.
  • Routes exist to allow compute resources to reach the VPC endpoints.

Next steps