Enable private connectivity using AWS PrivateLink
This article explains how to use AWS PrivateLink for private connectivity between users and their Databricks workspaces and between the classic compute plane and the control plane within the Databricks workspace infrastructure.
Private connectivity overview
AWS PrivateLink enables private connectivity from AWS VPCs and on-premises networks to AWS services, avoiding exposure to the public internet. Databricks workspaces support PrivateLink connections for two types of connections:
Front-end (user to workspace): This connection type allows users to access the Databricks web application, REST API, and Databricks Connect API via a VPC interface endpoint.
Back-end (classic compute plane to control plane): Compute resources in the classic compute plane access core services of the Databricks workspace in the control plane, which is located in the Databricks cloud account. This connection type uses two different VPC interface endpoints to connect to two destinations: REST APIs and the secure cluster connectivity relay.
You can implement both front-end and back-end PrivateLink connections or just one of them, and you can optionally enforce private connectivity for the workspace, causing Databricks to reject any public network connections.
To enable PrivateLink connections, create Databricks configuration objects and update existing configuration objects with new fields.
The following diagram shows the network flow in a typical implementation.
Requirements
Your Databricks account is on the Enterprise pricing tier.
Your Databricks workspace must use a customer-managed VPC. See Configure a customer-managed VPC. You cannot convert an existing workspace from a Databricks-managed VPC to a customer-managed VPC.
Your Databricks workspace must use What is secure cluster connectivity?. To add back-end PrivateLink to an older workspace that does not use secure cluster connectivity, contact your Databricks account team.
You must have all necessary AWS permissions to set up a Databricks workspace and to create new VPC endpoints for your workspace.
To establish a front-end PrivateLink connection for accessing the workspace from your on-premises network, connect your on-premises network to an AWS VPC using Direct Connect or VPN.
Note
The us-west-1
region does not support PrivateLink.
Step 1: Configure AWS network objects
You can use the AWS Management Console to create these objects or automate the process with tools such as the Terraform provider for networks.
If you haven’t already, set up a VPC for your workspace. You can reuse a VPC from another workspace. To create a VPC, see Configure a customer-managed VPC. If updating a workspace for PrivateLink, ensure it already uses a customer-managed VPC.
Ensure your VPC has both DNS Hostnames and DNS resolution enabled.
Ensure that the network ACLs for the subnets have bidirectional (outbound and inbound) rules that allow TCP access to 0.0.0.0/0 for these ports:
443: for Databricks infrastructure, cloud data sources, and library repositories
3306: for the metastore
6666: for PrivateLink
2443: only for use with compliance security profile
8443: for internal calls from the Databricks compute plane to the Databricks control plane API
8444: for Unity Catalog logging and lineage data streaming into Databricks.
8445 through 8451: Future extendability.
Configure the necessary settings for either back-end PrivateLink, front-end PrivateLink, or both:
Create and configure an extra VPC subnet (optional):
For your VPC endpoints, including back-end PrivateLink VPC endpoints and also any optional VPC endpoints to other AWS services, you can create them in any of your workspace subnets as long as the network can route to the VPC endpoints.
Attach a separate route table to your VPC endpoints subnet, distinct from the route table for your workspace subnets. This route table should have only a single default route for the local VPC.
Create and configure an extra security group (recommended but optional):
In addition to the standard security group required for a workspace, create a separate security group that permits HTTPS/443 and TCP/6666 bidirectional (outbound and inbound) access to both the workspace subnets and the separate VPC endpoints subnet, if you have one. This setup facilitates access for both REST APIs (port 443) and secure cluster connectivity (6666), simplifying security group management.
If your workspace uses the compliance security profile, you must also allow bidirectional (outbound and inbound) access to port 2443 to support FIPS endpoints for the secure cluster connectivity relay.
Ensure your transit VPC and its subnets are accessible from the user environment. Create a transit VPC that either terminates your AWS Direct Connect or VPN gateway connection or is routable from your transit VPC.
If you enable both front-end and back-end PrivateLink, you can optionally share the front-end workspace (web application) VPC endpoint with the back-end workspace (REST API) VPC endpoint if the VPC endpoint is network accessible from the workspace subnets.
Create a new security group for the front-end endpoint that allows HTTPS (port 443) bidirectional (outbound and inbound) access for both the source network and the endpoint subnet.
Step 2: Create VPC endpoints
For back-end PrivateLink, create VPC endpoints for the secure cluster connectivity relay and for the workspace, enabling compute plane calls to Databricks REST APIs. For guidance on managing VPC endpoints with the AWS Management Console, see the AWS article Create VPC endpoints in the AWS Management Console. You can share back-end VPC endpoints across multiple workspaces that use the same customer-managed VPC.
To create back-end VPC endpoints in the AWS Management Console:
Go to the VPC endpoints section of the AWS Management Console.
In the upper right, set the region to the same region as your workspace.
Click Create Endpoint.
Name the endpoint, incorporating the region and the word
workspace
, such asdatabricks-us-west-2-workspace-vpce
, for the workspace VPC endpoint.Under Service Category, select Other endpoint services.
In the service name field, paste in the service name. Get your region’s VPC endpoint service domains from the table in PrivateLink VPC endpoint services.
For your first VPC endpoint that you create, copy the regional service name for the workspace.
Click Verify service and ensure the page shows Service name verified in a green box. If you encounter an error stating “Service name could not be verified”, check that the regions of your VPC, subnets, and new VPC endpoint match.
In the VPC field, select your workspace VPC.
In the Subnets section, select exactly one of your Databricks workspace subnets.
In the Security groups section, select the security group you created for back-end connections in Step 1: Configure AWS network objects.
Under Additional settings, turn on the Enable DNS name option.
Click Create endpoint.
Repeat the previous steps to create the secure cluster connectivity relay endpoint. Use the table in PrivateLink VPC endpoint services to get the regional service name for the secure cluster connectivity relay. Databricks recommends that you include the region and the word
scc
in the endpoint name, such asdatabricks-us-west-2-scc-vpce
.
A front-end endpoint originates from your transit VPC, typically serving as the source for user web application access. This is usually a separate VPC from the workspace’s compute plane VPC and may be connected to an on-premises network. If you have multiple Databricks accounts, you can share a front-end VPC endpoint across these accounts. Register the endpoint in each relevant Databricks account.
To create front-end VPC endpoints in the AWS Management Console:
Go to the VPC endpoints section of the AWS Management Console.
In the upper right, set the region to the same region as your transit VPC region. This can be different than your workspace region.
Click Create Endpoint.
Name the endpoint, including the region and either the word
workspace
orfrontend
, such asdatabricks-us-west-2-workspace-vpce
.Under Service Category, select Other endpoint services.
In the service name field, paste in the service name. Use the table in PrivateLink VPC endpoint services to find the regional service names. Copy the one labeled Workspace (including REST API).
Click Verify service and ensure the page shows Service name verified in a green box. If you encounter an error stating “Service name could not be verified”, check that the regions of your VPC, subnets, and new VPC endpoint are correctly matched.
In VPC, select your transit VPC.
In Subnets, select a subnet.
In the Security groups section, select the security group you created for front-end connections.
Click Create endpoint.
Step 3: Register PrivateLink objects
The following uses the account console. You can also use the Account API or Databricks Terraform provider.
Within the account console, several types of objects are relevant for PrivateLink configuration:
VPC endpoint registrations: After creating VPC endpoints in the AWS Management Console, register them with Databricks to establish VPC endpoint registrations. After creation, you cannot update VPC endpoint registrations.
For back-end VPC endpoints, ensure the region field matches both your workspace region and the region of the AWS VPC endpoints you’re registering. For front-end PrivateLink, the region field must match your transit VPC region and the region of the AWS VPC endpoint for the workspace’s front-end connection.
To register your back-end and front-end VPC endpoints, follow the instructions in Manage VPC endpoint registrations.
Network configurations (only required for back-end VPC endpoints): Network configurations detail information about a customer-managed VPC and include two back-end PrivateLink configuration fields.
To create a network configuration, see Create network configurations for custom VPC deployment. For comprehensive requirements for customer-managed VPCs, their subnets, and security groups, see Configure a customer-managed VPC. In the Back-end private connectivity section, set the fields to your back-end VPC endpoint registrations as follows:
In the first field, select the VPC endpoint registration for the secure cluster connectivity relay.
In the second field, choose the VPC endpoint registration for the workspace (REST APIs).
After creation, network configurations cannot be updated.
Private access configurations: A workspace’s private access configuration object includes settings for AWS PrivateLink connectivity. You can use a single private access settings object for multiple workspaces in the same AWS region. To create a private access settings (PAS) object, see Manage private access settings.
Step 4: Create or update your workspace with PrivateLink objects
Your workspace must already be using a customer-managed VPC and secure cluster connectivity.
See Manually create a workspace (existing Databricks accounts) to create a workspace. Refer to that article for guidance on workspace fields such as workspace URL, region, Unity Catalog, credential configurations, and storage configurations. Do not click the Save button yet.
Click Advanced configurations to view additional fields.
For back-end PrivateLink, choose the network configuration. Under Virtual Private Cloud, in the menu choose the Databricks network configuration you created.
For any PrivateLink usage, select the private access settings object. Look below the Private Link heading. Click the menu and choose the name of the private access settings object that you created.
Click Save.
After creating (or updating) a workspace, wait until it’s available for using or creating clusters. The workspace status stays at status
RUNNING
and the VPC change happens immediately. However, you cannot use or create clusters for another 20 minutes. If you create or use clusters before this time interval elapses, clusters do not launch successfully, fail, or could cause other unexpected behavior.
Step 5: Configure internal DNS to redirect user requests to the web application (front-end)
To direct user requests to your front-end PrivateLink connection, change the private DNS for the network your users connect to. After creating or updating the workspace to include PrivateLink, ensure your workspace URL maps to the private IP of your workspace VPC endpoint in your internal or custom DNS.
Configure your internal DNS such that it maps the web application workspace URL to your front-end VPC endpoint.
Use the nslookup
Unix command line tool to test the DNS resolution using your workspace deploy domain name, for example:
nslookup my-workspace-name-here.cloud.databricks.com
Example response:
Non-authoritative answer:
my-workspace-name-here.cloud.databricks.com canonical name = oregon.cloud.databricks.com.
oregon.cloud.databricks.com canonical name = a89b3c627d423471389d6ada5c3311b4-f09b129745548506.elb.us-west-2.amazonaws.com.
Name: a89b3c627d423471389d6ada5c3311b4-f09b129745548506.elb.us-west-2.amazonaws.com
Address: 44.234.192.47
Example DNS mapping for a workspace with front-end VPC endpoint in AWS region us-east-1
:
By default the DNS mapping is:
myworkspace.cloud.databricks.com
maps tonvirginia.privatelink.cloud.databricks.com
. In this casenvirginia
is the control plane instance short name in that region.nvirginia.privatelink.cloud.databricks.com
maps tonvirginia.cloud.databricks.com
.nvirginia.cloud.databricks.com
maps to the AWS public IPs.
After your DNS changes, from your transit VPC (where your front-end VPC endpoint is), the DNS mapping would be:
myworkspace.cloud.databricks.com
maps tonvirginia.privatelink.cloud.databricks.com
.nvirginia.privatelink.cloud.databricks.com
maps to the private IP of your VPC endpoint for front-end connectivity.
For the workspace URL to map to the VPC endpoint private IP from the on-premises network, you must do one of the following:
Configure conditional forwarding for the workspace URL to use AmazonDNS.
Create an A-record for the workspace URL in your on-premises or internal DNS that maps to the VPC endpoint private IP.
Complete steps similar to what you would do to enable access to other similar PrivateLink-enabled services.
You can choose to map the workspace URL directly to the front-end (workspace) VPC endpoint private IP by creating an A-record in your internal DNS, such that the DNS mapping looks like this:
myworkspace.cloud.databricks.com
maps to the VPC endpoint private IP
After you make changes to your internal DNS configuration, test the configuration by accessing the Databricks workspace web application and REST API from your transit VPC. Create a VPC endpoint in the transit VPC if necessary to test the configuration.
If you haven’t configured your DNS record in your private DNS domain, you might see an error. Resolve this by creating the following records on your DNS server, enabling access to the workspace, Spark interface, and web terminal services.
Record type |
Record name |
Value |
---|---|---|
A |
<deployment-name>.cloud.databricks.com |
PrivateLink interface IP |
CNAME |
dbc-dp-<workspace-id>.cloud.databricks.com |
<deployment-name>.cloud.databricks.com |
If you have questions about how this applies to your network architecture, contact your Databricks account team.
Step 6: (Optional) Configure front-end PrivateLink with unified login
Preview
Unified login with front-end PrivateLink is in Private Preview. You must contact your Databricks account team to request access to this preview.
To use unified login with front-end PrivateLink, users need access to the account console from the transit VPC. If your transit VPC disallows access to the public internet, you must follow the steps below to configure your identity provider and transit VPC to support unified login. If your transit VPC allows access to the public internet, this is not required.
Unified login allows you to manage one SSO configuration in your account that is used for the account and Databricks workspaces. See Enable unified login. To use unified login with front-end PrivateLink, you must configure the following:
Step 6a: Authorize the PrivateLink redirect URI in your identity provider
As an account admin, log in to the account console.
In the sidebar, click Settings.
Click the Authentication tab.
Next to Authentication, click Manage.
Choose Single sign-on with my identity provider.
Click Continue.
Copy the value in the Databricks Redirect URL field.
Replace the
accounts
withaccounts-pl-auth
to get the Databricks PrivateLink Redirect URI.Go to your identity provider.
Add the Databricks PrivateLink Redirect URI as an additional redirect URL. If you configure SSO using SAML, also add the Databricks PrivateLink Redirect URI as an additional entity ID.
If you have both private link and non-private link workspaces in your account, do not remove the Databricks Redirect URL with
account
from your identity provider redirect URLs.
Step 6b: Set up a private hosted zone for your transit VPC
Perform the following configuration in your transit VPC to ensure that the Databricks PrivateLink Redirect URI maps to the VPC endpoint private IP address for your workspace VPC endpoint.
From your transit VPC, Use the
nslookup
Unix command line tool to get the DNS resolution using your workspace URL. See the example in Step 5: Configure internal DNS to redirect user requests to the web application (front-end).Copy the control plane instance URL of your private link workspace. The control plane instance URL is in the format
<region>.privatelink.cloud.databricks.com
.In your transit VPC, create a private hosted zone with domain name
privatelink.cloud.databricks.com
.Add a CNAME record that resolves
accounts-pl-auth.privatelink.cloud.databricks.com
to your control plane instance URL.Test the configuration by accessing the Databricks PrivateLink Redirect URI from your transit VPC.
Step 7: Add VPC endpoints for other AWS services
For typical use cases, the following VPC endpoints are required so that clusters and other compute resources in the classic compute plane can connect to AWS native services:
S3 VPC gateway endpoint: Attach this only to the route table that’s attached to your workspace subnets. If you’re using the recommended separate subnet with its own route table for back-end VPC endpoints, then the S3 VPC endpoint doesn’t need to be attached to that particular route table. See this AWS article about S3 gateway endpoints.
STS VPC interface endpoint: Create this in all the workspace subnets and attach it to the workspace security group. Do not create this in the subnet for back-end VPC endpoints. See this AWS section about STS interface endpoints and this general article about interface endpoints.
Kinesis VPC interface endpoint: Create the Kinesis VPC interface endpoint in all workspace subnets and attach it to the workspace security group, similar to the STS VPC interface endpoint. For more information, see this AWS article about Kinesis interface endpoints and this general article about interface endpoints.
To centralize your endpoints check that the following are true:
Compute resources resolve the fully qualified domain name for each service to the private IP of the corresponding VPC endpoint.
Routes exist to allow compute resources to reach the VPC endpoints.