Enable private connectivity using AWS PrivateLink
This page provides a general overview of PrivateLink at Databricks and includes configuration steps to enable back-end private connectivity.
- To enable front-end private connectivity to Databricks, see Configure private connectivity to Databricks.
- To use the REST API, see the Private Access Settings API reference.
Private connectivity overview
PrivateLink enables secure, private connectivity from your AWS VPCs and on-premises networks to AWS services, ensuring that your traffic remains isolated from the public internet. This capability is designed to help organizations address security and compliance requirements by enabling end-to-end private networking and minimizing the risk of data exfiltration.
You can enable either front-end or back-end PrivateLink connections independently, or both, depending on your security and compliance requirements. You can also enforce private connectivity for the workspace, causing Databricks to reject all public network connections automatically. This combined approach delivers comprehensive network isolation, reducing your attack surface and supporting compliance for sensitive workloads.
With PrivateLink you can:
- Block data access from unauthorized networks or the public internet when using the Databricks web application or APIs.
- Significantly lower the risk of data exfiltration by restricting network exposure to approved private endpoints only.
In order to deploy PrivateLink, you must:
- Create specific Databricks configuration objects and update existing configurations with new fields to define private access settings and permitted VPC endpoints.
- Choose whether to implement either or both connection types, with optional enforcement for complete network isolation.
Enable back-end PrivateLink for your workspace
Back-end PrivateLink (classic compute to control plane) connects Databricks classic compute resources in a customer VPC to workspace core services. Clusters connect to the control plane for Databricks REST APIs and secure cluster connectivity relay. The following guide includes some configuration steps you can perform using the Databricks account console or the API.
Requirements
- Your Databricks account is on the Enterprise pricing tier.
- Your Databricks workspace must use a customer-managed VPC. You cannot convert an existing workspace from a Databricks-managed VPC to a customer-managed VPC. See Configure a customer-managed VPC. Y
- Your Databricks workspace must use secure cluster connectivity. To add back-end PrivateLink to an older workspace that does not use secure cluster connectivity, contact your Databricks account team.
- You must have all necessary AWS permissions to set up a Databricks workspace and to create new VPC endpoints for your workspace.
- To establish a front-end PrivateLink connection for accessing the workspace from your on-premises network, connect your on-premises network to an AWS VPC using Direct Connect or VPN.
- Allow network traffic from all relevant address spaces within your local network to connect to your VPC endpoint using TCP port 443.
Step 1: Configure AWS network objects
You can use the AWS Management Console to create these objects or automate the process with tools such as the Terraform provider for networks.
-
If you haven't already, set up a VPC for your workspace. You can reuse a VPC from another workspace. To create a VPC, see Configure a customer-managed VPC. If updating a workspace for PrivateLink, ensure it already uses a customer-managed VPC.
-
Ensure your VPC has both DNS Hostnames and DNS resolution enabled.
-
Ensure that the network ACLs for the subnets have bidirectional (outbound and inbound) rules that allow TCP access to 0.0.0.0/0 for these ports:
- 443: for Databricks infrastructure, cloud data sources, and library repositories
- 3306: for the metastore
- 6666: for PrivateLink
- 2443: only for use with compliance security profile
- 8443: for internal calls from the Databricks compute plane to the Databricks control plane API
- 8444: for Unity Catalog logging and lineage data streaming into Databricks.
- 8445 through 8451: Future extendability.
-
Create and configure an extra VPC subnet (optional):
- For your VPC endpoints, including back-end PrivateLink VPC endpoints and also any optional VPC endpoints to other AWS services, you can create them in any of your workspace subnets as long as the network can route to the VPC endpoints.
- Attach a separate route table to your VPC endpoints subnet, distinct from the route table for your workspace subnets. This route table should have only a single default route for the local VPC.
-
Create and configure an extra security group (recommended but optional):
-
In addition to the standard security group required for a workspace, create a separate security group that permits HTTPS/443 and TCP/6666 bidirectional (outbound and inbound) access to both the workspace subnets and the separate VPC endpoints subnet, if you have one. This setup facilitates access for both REST APIs (port 443) and secure cluster connectivity (6666), simplifying security group management.
If your workspace uses the compliance security profile, you must also allow bidirectional (outbound and inbound) access to port 2443 to support FIPS endpoints for the secure cluster connectivity relay.
-
Step 2: Create VPC endpoints
For back-end PrivateLink, create VPC endpoints for the secure cluster connectivity relay and for the workspace, enabling compute plane calls to Databricks REST APIs. For guidance on managing VPC endpoints with the AWS Management Console, see the AWS article Create VPC endpoints in the AWS Management Console. You can share back-end VPC endpoints across multiple workspaces that use the same customer-managed VPC.
To create back-end VPC endpoints in the AWS Management Console:
-
Go to the VPC endpoints section of the AWS Management Console.
-
In the upper right, set the region to the same region as your workspace.
-
Click Create Endpoint.
-
Name the endpoint, incorporating the region and the word
workspace
, such asdatabricks-us-west-2-workspace-vpce
, for the workspace VPC endpoint. -
Under Service Category, select Other endpoint services.
-
In the service name field, paste in the service name. Get your region's VPC endpoint service domains from the table in PrivateLink VPC endpoint services.
For the first VPC endpoint that you create, copy the regional service name for the workspace.
-
Click Verify service and ensure the page shows Service name verified in a green box. If you encounter an error stating “Service name could not be verified”, check that the regions of your VPC, subnets, and new VPC endpoint match.
-
In the VPC field, select your workspace VPC.
-
In the Subnets section, select exactly one of your Databricks workspace subnets.
-
In the Security groups section, select the security group you created for back-end connections in Step 1: Configure AWS network objects.
-
Under Additional settings, turn on the Enable DNS name option.
-
Click Create endpoint.
-
Repeat the previous steps to create the secure cluster connectivity relay endpoint. Use the table in PrivateLink VPC endpoint services to get the regional service name for the secure cluster connectivity relay. Databricks recommends including the region and the word
scc
in the endpoint name, such asdatabricks-us-west-2-scc-vpce
.
Step 3: Register PrivateLink objects
Several types of objects are relevant for PrivateLink configuration:
- VPC endpoint registrations: After creating VPC endpoints in the AWS Management Console, you must register them with Databricks to establish VPC endpoint registrations. VPC endpoint registrations cannot be updated later.
- For back-end VPC endpoints, ensure the region field matches your workspace region and the region of the AWS VPC endpoints you’re registering. For front-end PrivateLink, the region field must match your transit VPC region and the region of the AWS VPC endpoint for the workspace’s front-end connection.
- To register your back-end and front-end VPC endpoints, follow the instructions in Manage VPC endpoint registrations.
- Network configurations (only required for back-end VPC endpoints): Network configurations detail information about a customer-managed VPC and include two back-end PrivateLink configuration fields.
- To create a network configuration, see Create network configurations for custom VPC deployment. For comprehensive requirements for customer-managed VPCs, subnets, and security groups, see Configure a customer-managed VPC. In the Back-end private connectivity section, set the fields to your back-end VPC endpoint registrations as follows:
- In the first field, select the VPC endpoint registration for the secure cluster connectivity relay.
- In the second field, choose the VPC endpoint registration for the workspace (REST APIs).
- After you create a network configuration, you cannot update it.
- Private access configurations: A workspace’s private access configuration object includes settings for AWS PrivateLink connectivity. You can use a single private access settings object for multiple workspaces in the same AWS region. To create a private access settings (PAS) object, see Manage private access settings.
Step 4: Create or update your workspace with PrivateLink objects
Your workspace must already be using a customer-managed VPC and secure cluster connectivity.
- Create a workspace using the instructions here. This page explains how to configure key workspace settings, including the workspace URL, region, Unity Catalog (UC) integration, credential configurations, and storage configurations. Do not click the Save button yet.
- Click Advanced configurations to view additional fields.
- Under Virtual Private Cloud, in the menu choose the Databricks network configuration you created.
- Below the Private Link heading, click the menu, and choose the name of the private access settings object you created.
- Click Save.
After you create a workspace, its status will change to RUNNING
, and associated VPC network updates are immediately applied. However, you must wait an additional 20 minutes after the status shows RUNNING
before you can successfully create or use clusters. Attempting to create or use cluster before this time could result in launch failures, errors, or other unexpected behavior.
Step 5: Add VPC endpoints for other AWS services
For typical use cases, creating the following VPC endpoints is recommended. This allows clusters and other compute resources in the classic compute plane to connect directly to AWS native services over AWS PrivateLink. Create these VPC endpoints in the same subnet as your back-end VPC endpoint.
These VPC endpoints are required for use cases where clusters do not have network access to the public AWS endpoints:
- S3 VPC gateway endpoint: Attach this only to the route table that's attached to your workspace subnets. If you're using the recommended separate subnet with its own route table for back-end VPC endpoints, then the S3 VPC endpoint doesn't need to be attached to that particular route table. See this AWS article about S3 gateway endpoints.
- STS VPC interface endpoint: Create this in all the workspace subnets and attach it to the workspace security group. See this AWS section about STS interface endpoints and this general article about interface endpoints.
- Kinesis VPC interface endpoint: Create the Kinesis VPC interface endpoint in all workspace subnets and attach it to the workspace security group, similar to the STS VPC interface endpoint. For more information, see this AWS article about Kinesis interface endpoints and this general article about interface endpoints.
To centralize your endpoints check that the following are true:
- Compute resources resolve the fully qualified domain name for each service to the private IP of the corresponding VPC endpoint.
- Routes exist to allow compute resources to reach the VPC endpoints.