Enable AWS PrivateLink
This article explains how to use AWS PrivateLink to enable private connectivity between users and their Databricks workspaces and between clusters on the data plane and core services on the control plane within the Databricks workspace infrastructure.
Important
This article mentions the term data plane, which is the compute layer of the Databricks platform. In the context of this article, data plane refers to the Classic data plane in your AWS account. By contrast, the serverless data plane that supports serverless SQL warehouses runs in the Databricks AWS account. To learn more, see Serverless compute.
Overview
AWS PrivateLink provides private connectivity from AWS VPCs and on-premises networks to AWS services without exposing the traffic to the public network. Databricks workspaces on the E2 version of the platform support PrivateLink connections for two connection types:
Front-end (user to workspace): A front-end PrivateLink connection allows users to connect to the Databricks web application, REST API, and Databricks Connect API over a VPC interface endpoint.
Back-end (data plane to control plane): Databricks Runtime clusters in a customer-managed VPC (the data plane) connect to a Databricks workspace’s core services (the control plane) in the Databricks cloud account. Clusters connect to the control plane for two destinations: REST APIs (such as the Secrets API) and the secure cluster connectivity relay. This PrivateLink connection type involves two different VPC interface endpoints because of the two different destination services.
You can implement both front-end and back-end PrivateLink connections or just one of them. This article discusses how to configure either one or both PrivateLink connection types. If you implement PrivateLink for both the front-end and back-end connections, you can optionally mandate private connectivity for the workspace, which means Databricks rejects any connections over the public network. If you decline to implement any one of these connection types, you cannot enforce this requirement.
To enable PrivateLink connections, you must create Databricks configuration objects and add new fields to existing configuration objects.
To create configuration objects and create (or update) a workspace, this article describes how to use the account console or use the Account API.
The following table describes important terminology.
Terminology |
Description |
---|---|
AWS PrivateLink |
An AWS technology that provides private connectivity from AWS VPCs and on-premises networks to AWS services without exposing the traffic to the public network. |
Front-end PrivateLink |
The PrivateLink connection for users to connect to the Databricks web application, REST API, and Databricks Connect API. |
Back-end PrivateLink |
The PrivateLink connection for the data plane in your AWS account to connect to the Databricks control plane). |
AWS VPC endpoint service |
An AWS VPC endpoint service is a PrivateLink-powered service. Each Databricks control plane (typically one per region) publishes two AWS VPC endpoint services for PrivateLink. The workspace VPC endpoint service applies to both a Databricks front-end PrivateLink connection or the Databricks back-end PrivateLink connection for REST APIs. Databricks publishes another VPC endpoint service for its secure cluster connectivity relay. |
AWS VPC endpoint |
An AWS VPC interface endpoint enables private connections between your VPC and VPC endpoint services powered by AWS PrivateLink. You must create AWS VPC interface endpoints and then register them with Databricks. Registering a VPC endpoint creates a Databricks-specific object called a VPC endpoint registration that references the AWS VPC endpoint. |
Databricks network configuration |
A Databricks object that describes the important information about a Customer-managed VPC. If you implement any PrivateLink connection (front-end or back-end), your workspace must use a customer-managed VPC. For PrivateLink back-end support only, your network configuration needs an extra property that identifies the VPC endpoints for the back-end connection. |
Databricks private access settings object |
A Databricks object that describes a workspace’s PrivateLink connectivity. You must attach a private access settings object to the workspace during workspace creation, whether using front-end, back-end, or both. It expresses your intent to use AWS PrivateLink with your workspace. It controls your settings for the front-end use case of AWS PrivateLink for public network access. It controls which VPC endpoints are permitted to access your workspace. |
Databricks workspace configuration object |
A Databricks object that describes a workspace. To enable PrivateLink, this object must reference Databricks private access settings object. For back-end PrivateLink, the workspace must also have a Databricks network configuration object with two extra fields that specify which VPC endpoint registrations to use, one for control plane’s secure cluster connectivity relay and the other connects to the workspace to access REST APIs. |
Updates of existing PrivateLink configuration objects
This article focuses on the main two use cases of creating a new workspace or enabling PrivateLink on a workspace. You also can make other configuration changes to related objects using the UI or API:
You can upgrade a workspace’s PrivateLink support to add support for front-end, back-end, or both types of connectivity. Add a private access settings object (UI or API). To do so, create a new network configuration with new settings, for example for a new VPC or different PrivateLink support settings, and then update the workspace to use the new network configuration. Note that you cannot remove (downgrade) any existing front-end or back-end PrivateLink support on a workspace.
Add or update a workspace’s registered VPC endpoints by creating a new network configuration object with registered VPC endpoints and then update the workspace’s network configuration (UI or API).
For more information about what kinds of workspace fields can be changed on failed or running workspaces, see information about this task by using the UI or API.
Note that not all related objects can be updated. Where update is not possible, create new objects and set their parent objects to reference the new objects. The following rules apply both to the account console UI and the Account API:
Object |
Can be created |
Can be updated |
---|---|---|
Workspace configurations |
Yes |
Yes |
Private access settings |
Yes |
Yes |
Network configurations |
Yes |
No |
VPC endpoint registrations |
Yes |
No |
To update CIDR ranges on an existing VPC, see Updating CIDRs.
Requirements
Databricks account
Your Databricks account is on the E2 version of the platform.
Your Databricks account is on the Enterprise pricing tier.
You have your Databricks account ID. Get your account ID from the account console.
Databricks workspace
Your workspace must be in an AWS region that supports the E2 version of the platform. However, the
us-west-1
region does not support PrivateLink even for workspaces on the E2 version of the platform.Your Databricks workspace must use Customer-managed VPC to add any PrivateLink connection (even a front-end-only connection). Note that you cannot update an existing workspace with a Databricks-managed VPC and change it to use a customer-managed VPC.
If you implement the back-end PrivateLink connection, your Databricks workspace must use Secure cluster connectivity, which is the default for new workspaces on the E2 version of the platform. To add back-end PrivateLink to an older existing workspace that does not use secure cluster connectivity, contact your Databricks representative.
AWS account permissions
If you are the user who sets up PrivateLink, you must have all necessary AWS permissions to provision a Databricks workspace and to provision new VPC endpoints for your workspace.
Network architecture
To implement the front-end PrivateLink connection to access the workspace from your on-premises network, add private connectivity from the on-premises network to an AWS VPC using either Direct Connect or VPN.
For guidance for other network objects, see Step 1: Configure AWS network objects.
Step 1: Configure AWS network objects
You can use the AWS Management Console to create these objects or automate the process with tools such as the Terraform provider for networks.
To configure a VPC, subnets, and security groups:
Set up a VPC for your workspace if you haven’t already done so. You may re-use a VPC from another workspace, but you must create separate subnets for each workspace. Every workspace requires at least two private subnets.
To create a VPC, see Customer-managed VPC. If you are updating a workspace for PrivateLink rather than creating a new workspace, note that the workspace must already be using a customer-managed VPC.
On your VPC, ensure that you enable both of the settings DNS Hostnames and DNS resolution.
Ensure that the network ACLs for the subnets have bidirectional (outbound and inbound) rules that allow TCP access to 0.0.0.0/0 for these ports:
443: for Databricks infrastructure, cloud data sources, and library repositories
3306: for the metastore
6666: for PrivateLink
2443: only for use with compliance security profile
Important
If your workspace uses the compliance security profile, you must also allow bidirectional (outbound and inbound) access to port 2443 to support FIPS endpoints for the secure cluster connectivity relay.
For back-end PrivateLink:
Create and configure an extra VPC subnet (optional):
For your VPC endpoints, including back-end PrivateLink VPC endpoints and also any optional VPC endpoints to other AWS services, you can create them in any of your workspace subnets as long as the network can route to the VPC endpoints.
Attach a separate route table to your VPC endpoints subnet, which would be different from the route table attached to your workspace subnets. The route table for your VPC endpoints subnet needs only a single default route for the local VPC.
Create and configure an extra security group (recommended but optional):
In addition to the security group that is normally required for a workspace, create a separate security group that allows HTTPS/443 and TCP/6666 bidirectional (outbound and inbound) access to both the workspace subnets as well as the separate VPC endpoints subnet if you created one. This configuration allows access for both the workspace for REST APIs (port 443) and for secure cluster connectivity (6666). This makes it easy to share the security group for both purposes.
Important
If your workspace uses the compliance security profile, you must also allow bidirectional (outbound and inbound) access to port 2443 to support FIPS endpoints for the secure cluster connectivity relay.
For front-end PrivateLink:
For your transit VPC and its subnets, ensure they are reachable from the user environment. Create a transit VPC that terminates your AWS Direct Connect or VPN gateway connection or one that is routable from your transit VPC.
If you enable both front-end and back-end PrivateLink, you can optionally share the front-end workspace (web application) VPC endpoint with the back-end workspace (REST API) VPC endpoint if the VPC endpoint is network accessible from the workspace subnets.
Create a new security group for the front-end endpoint. The security group must allow HTTPS (port 443) bidirectional (outbound and inbound) access for both the source network and the endpoint subnet itself.
Step 2: Create VPC endpoints
Back-end VPC endpoints
For back-end PrivateLink, you create two VPC endpoints. One is for the secure cluster connectivity relay. One is for the workspace, which allows data plane calls to Databricks REST APIs. For general documentation on VPC endpoint management with the AWS Management Console, see the AWS article Create VPC endpoints in the AWS Management Console. When you create the VPC endpoints, it’s important to set the field in Additional settings that in the AWS Management Console page for creating VPC endpoints is called Enable DNS name. As a terminology note, this is the same field that AWS in some places refers to as Enable Private DNS or Enable Private DNS on this endpoint when viewing or editing a VPC endpoint.
For tools that help you automate creating and managing VPC endpoints, see the AWS articles CloudFormation: Creating VPC Endpoint and AWS CLI: create-vpc-endpoint.
You can share the back-end VPC endpoints across multiple workspaces that use the same customer-managed VPC. Whether you share the back-end VPC endpoints across multiple workspaces depends on your organization’s AWS architecture best practices and your overall throughput requirements across all workloads.
If you decide to share those across workspaces, you must create the back-end VPC endpoints in a separate subnet that is routable from the subnets of all the workspaces. For guidance, contact your Databricks representative.
You can also share VPC endpoints across workspaces from multiple Databricks accounts as long as the workspaces share the same customer-managed VPC, in which case you need to register the VPC endpoints in each Databricks account.
The following procedure uses the AWS Management Console. You can also automate this step using the Terraform provider for VPC endpoints.
To create the back-end VPC endpoints in the AWS Management Console:
Go to the VPC endpoints section of the AWS Management Console.
Use the region picker in the upper right next to your account name picker and confirm you are using the region that matches the region you will use for your workspace. If needed, change the region using the region picker.
Create the VPC endpoint:
Click Create Endpoint.
Give the endpoint a name that indicates the region and the purpose of the VPC endpoint. For the workspace VPC endpoint, Databricks recommends that you include the region and the word
workspace
, such asdatabricks-us-west-2-workspace-vpce
.Under Service Category, choose Other endpoint services.
In the service name field, paste in the service name. Use the table in Regional endpoint reference to get the two regional service names for your region.
For your first VPC endpoint that you create, copy the regional service name for the workspace (REST API).
Click Verify service. Confirm the page reports in a green box Service name verified. If you see an error “Service name could not be verified”, check whether you have correctly matched the regions of your VPC, subnets, and your new VPC endpoint.
In the VPC field, select your VPC. Choose your workspace VPC.
In the Subnets section, choose exactly one of your Databricks workspace subnets. For related discussion, see Step 1: Configure AWS network objects.
In the Security groups section, choose the security group you created for back-end connections in Step 1: Configure AWS network objects.
Click to expand the Additional settings section.
Ensure that the endpoint has the Enable DNS name field enabled. As a terminology note, this is the same field that AWS in some places refers to as Enable Private DNS or Enable Private DNS on this endpoint when viewing or editing a VPC endpoint.
Click Create endpoint.
Repeat the above procedure and use the table in Regional endpoint reference to get the regional service name for the secure cluster connectivity relay. Give the endpoint a name that indicates the region and the purpose of the VPC endpoint. Databricks recommends that you include the region and the word
scc
, such asdatabricks-us-west-2-scc-vpce
.
Front-end VPC endpoints
A front-end endpoint originates in your transit VPC that usually is the source of user web application access, Typically it is a transit VPC that is connected to an on-premises network. This is generally a separate VPC from the workspace’s data plane VPC. Although the Databricks VPC endpoint service is the same shared service for the front-end connection and the back-end REST API connection, in typical implementations, connections originate from two separate VPCs and thus need separate AWS VPC endpoints that originate in each VPC.
If you have multiple Databricks accounts, you can share a front-end VPC endpoint across Databricks accounts. Register the endpoint in each relevant Databricks account.
The following procedure uses the AWS Management Console. You can also automate this step using the Terraform provider for VPC endpoints.
To create the front-end VPC endpoints in the AWS Management Console:
Go to the VPC endpoints section of the AWS Management Console.
Use the region picker in the upper right next to your account name picker and confirm you are using the region that matches the transit VPC region, which in some cases might be different than your workspace region. If needed, change the region using the region picker.
Create the VPC endpoint:
Click Create Endpoint.
Give the endpoint a name that indicates the region and the purpose of the VPC endpoint. For the workspace VPC endpoint, Databricks recommends that you include the region and the word
workspace
orfrontend
, such asdatabricks-us-west-2-workspace-vpce
.Under Service Category, choose Other endpoint services.
In the service name field, paste in the service name. Use the table in Regional endpoint reference to find the regional service names. Copy the one labelled Workspace (including REST API).
Click Verify service. Confirm the page reports in a green box Service name verified. If you see an error “Service name could not be verified”, check whether you have correctly matched the regions of your VPC, subnets, and your new VPC endpoint.
In the VPC menu, click your transit VPC.
In the Subnets section, choose a subnet. For related discussion, see Step 1: Configure AWS network objects.
In the Security groups section, choose the security group you created for front-end connections in Step 1: Configure AWS network objects.
Click Create endpoint.
Regional endpoint reference
Get your region’s VPC endpoint service domains from the table at PrivateLink VPC endpoint services.
Note
If you use the account console to create your network configuration, the UI refers to the workspace VPC endpoint as the VPC endpoint for REST APIs.
Step 3: Register PrivateLink objects and attach them to a workspace
There are two ways you can perform this step:
Use the account console
You can use the account console to register your VPC endpoints, create and register other required workspace resources, and finally create a new workspace with PrivateLink.
Within the account console, several types of objects are relevant for PrivateLink configuration:
VPC endpoint registrations (required for front-end, back-end, or both): After creating VPC endpoints in the AWS Management Console (see the previous step), register them in Databricks to create VPC endpoint registrations. See the account console’s page for VPC endpoints.
Network configurations (required for back-end VPC endpoints): Network configurations represent information about a customer-managed VPC. They also contain two back-end PrivateLink configuration fields. Add these two fields in the network configuration object. They must reference the two back-end VPC endpoints that you created in AWS. See the account console’s page for network configurations. If you have an existing network configuration and you want to add fields for PrivateLink, you must create a new network configuration.
Private access configurations (required for front-end, back-end, or both): A workspace’s private access configuration object encapsulates a few settings about AWS PrivateLink connectivity. Create a new private access settings object just for this workspace, or share one among multiple workspaces in the same AWS region. This object serves several purposes. It expresses your intent to use AWS PrivateLink with your workspace. It controls your settings for the front-end use case of AWS PrivateLink for public network access. It controls which VPC endpoints are permitted to access your workspace.
There are two ways to use the account console to define cloud resources for a workspace.
Create resources in advance: You can create relevant cloud resources first before you create your workspace in the cloud resources area of the account console. This is useful if you might not be able to do all the steps at the same time or if different teams perform network setup and create workspaces.
Within the workspace creation page, add configurations as needed: On the page that creates (or updates) a workspace, there are pickers for different cloud resources. In most cases, there are picker items that let you create that resource immediately in a pop-up view. For example, a network configuration picker has an option Add a new network configuration.
This article describes the how to create resources in advance and then reference them. You can use the other approach if that works better for you. See the editors for VPC endpoints, network configurations, and private access settings.
Step 3a: Register your VPC endpoints (for front-end, back-end, or both)
Follow the instructions in Manage VPC endpoint registrations.
For back-end PrivateLink, register the back-end VPC endpoints you created and name the configurations for their purposes, for example add
-scc
for secure cluster connectivity and-workspace
for the workspace (REST API) VPC endpoint registration. For back-end VPC endpoints, the region field must match your workspace region and the region of the AWS VPC endpoints that you are registering. However, Databricks validates this only during workspace creation (or during updating a workspace with PrivateLink), so it is critical that you carefully set the region in this step.For front-end PrivateLink, register the front-end VPC endpoint you created in the transit VPC. For front-end PrivateLink, the region field must match your transit VPC region and the region of the AWS VPC endpoint for the workspace for the front-end connection.
Step 3b: Create a network configuration (for back-end)
Follow the instructions in Create network configurations for custom VPC deployment. For detailed requirements for customer-managed VPC along with its associated subnets and security groups, see Customer-managed VPC. The most important fields for PrivateLink are under the heading Back-end private connectivity. There are two fields where you choose your back-end VPC endpoint registrations that you created in the previous step. For the first one, select the VPC endpoint registration for the secure cluster connectivity relay. For the other choose the VPC endpoint registration for the workspace (REST APIs).
Step 3c: Create a PAS object (for front-end, back-end, or both)
Creating a private access settings (PAS) object is an important step for PrivateLink configuration. Follow the instructions in Manage private access settings.
For the region, be sure it matches the region of your workspace as this is not validated immediately but workspace deployment fails if it does not match.
Set the Public access enabled field, which configures public access to the front-end connection (the web application and REST APIs) for your workspace.
If set to False (the default), the front-end connection can be accessed only using PrivateLink connectivity and not from the public internet. Because access from the public network is disallowed in this case, the IP access lists for workspaces feature is not supported for the workspace.
If set to True, the front-end connection can be accessed either from PrivateLink connectivity or from the public internet. You can optionally configure an IP access list for the workspace to restrict the source networks that could access the web application and REST APIs from the public internet (but not the PrivateLink connection).
Set the Private Access Level field to the value that best represents which VPC endpoints to allow for your workspace.
Set to Account to limit connections to those VPC endpoints that are registered in your Databricks account.
Set to Endpoint to limit connections to an explicit set of VPC endpoints, which you can enter in a field that appears. It lets you select VPC endpoint registrations that you’ve already created. Be sure to include your front-end VPC endpoint registration if you created one.
Step 3d: Create or update the workspace (front-end, back-end, or both)
The workspace must already use a customer-managed VPC and secure cluster connectivity must be enabled, which is the case for most E2 workspaces.
The following instructions describe creating a workspace using the account console’s workspaces page.
Follow the instructions in Create a workspace using the account console to create a workspace. See that article for guidance on workspace fields such as workspace URL, region, Unity Catalog, credential configurations, and storage configurations. Do not yet click the Save button.
Click Advanced configurations to view additional fields.
For back-end PrivateLink, choose the network configuration. Under Virtual Private Cloud, in the menu choose the Databricks network configuration you created.
For any PrivateLink usage, select the private access settings object. Look below the Private Link heading. Click the menu and choose the name of the private access settings object that you created.
Click Save.
After creating (or updating) a workspace, wait until it’s available for using or creating clusters. The workspace status stays at status
RUNNING
and the VPC change happens immediately. However, you cannot use or create clusters for another 20 minutes. If you create or use clusters before this time interval elapses, clusters do not launch successfully, fail, or could cause other unexpected behavior.Continue on to Step 4: Configure internal DNS to redirect user requests to the web application (for front-end).
Use the Account API
Step 3a: Register VPC endpoints (front-end, back-end, or both)
Using the Account API, register the VPC endpoint IDs for your back-end VPC endpoints. For each one, this creates a Databricks VPC endpoint registration.
For back-end VPC endpoints, if you have multiple workspaces in the same region that share the same customer-managed VPC, you can choose to share the AWS VPC endpoints. You can also share these VPC endpoints among multiple Databricks accounts, in which case register the AWS VPC endpoint in each Databricks account.
For front-end VPC endpoints, if you have multiple Databricks accounts, you can share a front-end VPC endpoint across Databricks accounts. Register the endpoint in each relevant Databricks account.
To register a VPC endpoint in Databricks, make a POST
request to the /accounts/<account-id>/vpc-endpoints
REST API endpoint and pass the following fields in the request body:
vpc_endpoint_name
: User-visible name for the VPC endpoint registration within Databricks.region
: AWS region nameaws_vpc_endpoint_id
: Your VPC endpoint’s ID within AWS. It starts with prefixvpce-
.
For example:
curl -X POST -n \
'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/vpc-endpoints' \
-d '{
"vpc_endpoint_name": "Databricks front-end endpoint",
"region": "us-west-2",
"aws_vpc_endpoint_id": "<vpce-id>"
}'
The response JSON includes a vpc_endpoint_id
field. If you are adding a back-end PrivateLink connection, save this value. This ID is specific to this configuration within Databricks. You need this ID when you create the network configuration in a later step (Step 3b: Create a network configuration (back-end)).
Related Account API operations that may be useful:
Check the state of a VPC endpoint registration — The
state
field in the response JSON indicates the state within AWS.
Step 3b: Create a network configuration (back-end)
Note
If you implement only the front-end connection, skip this step. Although you must create a network configuration because a customer-managed VPC is required, there are no PrivateLink changes to this object if you only implement a front-end PrivateLink connection.
For any PrivateLink support, you must use a Customer-managed VPC. This feature requires you to create a network configuration object that encapsulates the ID for the VPC, the subnets, and the security groups.
For back-end PrivateLink support, your network configuration must have an extra field that is specific to PrivateLink. The network configuration vpc_endpoints
field references your Databricks-specific VPC endpoint IDs that were returned when you registered your VPC endpoints. See Step 3a: Register VPC endpoints (front-end, back-end, or both).
Add both of these fields in that object:
rest_api
: Set this to a JSON array that includes exactly one element: the Databricks-specific ID for the back-end REST API VPC endpoint that you registered in Step 3a: Register VPC endpoints (front-end, back-end, or both).Important
Be careful to use the Databricks-specific ID that was created when you registered the regional endpoint based on the table in Regional endpoint reference. It is a common configuration error to set the wrong ID on this field.
dataplane_relay
: Set this to a JSON array that includes exactly one element: the Databricks-specific ID for the back-end SCC VPC endpoint that you registered in Step 3a: Register VPC endpoints (front-end, back-end, or both).Important
Be careful to use the Databricks-specific ID that was created when you registered the regional endpoint based on the table in Regional endpoint reference. It is a common configuration error to set the wrong ID on this field.
You get these Databricks-specific VPC endpoint IDs from the JSON responses of the requests you made in Step 3a: Register VPC endpoints (front-end, back-end, or both), within the vpc_endpoint_id
response field.
The following example creates a new network configuration that references the VPC endpoint IDs. Replace <databricks-vpce-id-for-scc>
with your Databricks-specific VPC endpoint ID for the secure cluster connectivity relay. Replace <databricks-vpce-id-for-rest-apis>
with your Databricks-specific VPC endpoint ID for the REST APIs.
curl -X POST -n \
'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/networks' \
-d '{
"network_name": "Provide name for the Network configuration",
"vpc_id": "<aws-vpc-id>",
"subnet_ids": [
"<aws-subnet-1-id>",
"<aws-subnet-2-id>"
],
"security_group_ids": [
"<aws-sg-id>"
],
"vpc_endpoints": {
"dataplane_relay": [
"<databricks-vpce-id-for-scc>"
],
"rest_api": [
"<databricks-vpce-id-for-rest-apis>"
]
}
}'
Step 3c: Create a PAS configuration (front-end, back-end, or both)
Use the Databricks Account API to create or attach a private access settings (PAS) object.
The private access settings object supports the following scenarios:
Implement only a front-end VPC endpoint
Implement only a back-end VPC endpoint
Implement both front-end and back-end VPC endpoints
For a workspace to support any of these PrivateLink connectivity scenarios, the workspace must be created with an attached private access settings object. This can be a new private access settings object intended only for this workspace, or re-use and share an existing private access setting object across multiple workspaces in the same AWS region.
This object serves two purposes:
Expresses your intent to use AWS PrivateLink with your workspace. If you intend to connect to your workspace using front-end or back-end PrivateLink, you must attach one of these objects to your workspace during workspace creation.
Controls your settings for the front-end use case of AWS PrivateLink. If you wish to use back-end PrivateLink only, you can choose to set the object’s
public_access_enabled
field totrue
.
In the private access settings object definition, the public_access_enabled
configures public access to the front-end connection (the web application and REST APIs) for your workspace:
If set to
false
(the default), the front-end connection can be accessed only using PrivateLink connectivity and not from the public internet. Because access from the public network is disallowed in this case, the IP access lists for workspaces feature is not supported for the workspace.If set to
true
, the front-end connection can be accessed either from PrivateLink connectivity or from the public internet. You can optionally configure an IP access list for the workspace to restrict the source networks that could access the web application and REST APIs from the public internet (but not the PrivateLink connection).
To create a private access settings object, make a POST
request to the /accounts/<account-id>/private-access-settings
REST API endpoint. The request body must include the following properties:
private_access_settings_name
: Human-readable name for the private access settings object.region
: AWS region name.public_access_enabled
: Specifies whether to enable public access for the front-end connection. Iftrue
, public access is possible for the front-end connection in addition to PrivateLink connections. See the previous table for the required value for your implementation.private_access_level
: Specify which VPC endpoints can connect to this workspace:ACCOUNT (the default)
: Limit connections to those VPC endpoints that are registered in your Databricks account.ENDPOINT
: Limit connections to an explicit set of VPC endpoints. See the relatedallowed_vpc_endpoint_ids
property.
Note
The private access level
ANY
is deprecated. The level is unavailable for new or existing private access settings objects.allowed_vpc_endpoint_ids
: Use only ifprivate_access_level
is set toENDPOINT
. This property specifies the set of VPC endpoints that can connect to this workspace. Specify as a JSON array of VPC endpoint IDs. Use the Databricks IDs that were returned during endpoint registration, not the AWS IDs.
curl -X POST -n \
'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/private-access-settings' \
-d '{
"private_access_settings_name": "Default PAS for us-west-2",
"region": "us-west-2",
"public_access_enabled": true
}'
The response JSON includes a private_access_settings_id
field. This ID is specific to this configuration within Databricks. It is important that you save that result field because you will need it when you create the workspace.
Related APIs:
Step 3d: Create or update a workspace
The workspace must already use a customer-managed VPC and have secure cluster connectivity enabled, which is the case for most E2 workspaces.
The important fields for creating a workspace with PrivateLink connectivity are private_access_settings_id
(the ID of your new private access settings object and network_id
(the ID of your new network configuration).
To create a workspace with PrivateLink connectivity:
Read the instructions in Databricks Account API for guidance on all fields for a new workspace with Account API. For complete instructions on all fields such as storage configurations, credential configurations, and customer-managed keys, see Create a workspace using the Account API.
Call the Create a new workspace API (
POST /accounts/{account_id}/workspaces
) and be sure to includeprivate_access_settings_id
andnetwork_id
, for example:curl -X POST -n \ 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces' \ -d '{ "workspace_name": "my-company-example", "deployment_name": "my-company-example", "aws_region": "us-west-2", "credentials_id": "<aws-credentials-id>", "storage_configuration_id": "<databricks-storage-config-id>", "network_id": "<databricks-network-config-id>", "managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>", "storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-config-id>", "private_access_settings_id": "<private-access-settings-id>" }'
After creating or updating an existing workspace with PrivateLink, you must wait before the workspace is available for using or creating clusters, the workspace status stays at status
RUNNING
and the VPC change happens immediately. However, you cannot use or create clusters for another 20 minutes. If you create or use clusters before this time interval elapses, clusters do not launch successfully, fail, or could cause other unexpected behavior.
Use Terraform
To use Terraform to create underlying AWS network objects and the related Databricks PrivateLink objects, see these Terraform providers:
Terraform provider that registers VPC endpoints. Before using this resource, you must have already created the necessary AWS VPC endpoints.
Terraform provider that creates an AWS VPC and a Databricks network configuration.
Terraform provider that creates a Databricks private access settings object.
To use Terraform to deploy a workspace, see this Terraform provider:
Step 4: Configure internal DNS to redirect user requests to the web application (for front-end)
You need to redirect user requests to the web application to use your front-end PrivateLink connection. This requires changing private DNS for the network that your users use or connect to. If users will access the Databricks workspace from an on-premises network that is under the scope of your internal or custom DNS, you must perform the following configuration after the workspace is created or updated to ensure that your workspace URL maps to the VPC endpoint private IP for your workspace VPC endpoint.
Configure your internal DNS such that it maps the web application workspace URL to your front-end VPC endpoint.
Use the nslookup
Unix command line tool to test the DNS resolution using your workspace deploy domain name, for example:
nslookup my-workspace-name-here.cloud.databricks.com
Example response:
Non-authoritative answer:
my-workspace-name-here.cloud.databricks.com canonical name = oregon.cloud.databricks.com.
oregon.cloud.databricks.com canonical name = a89b3c627d423471389d6ada5c3311b4-f09b129745548506.elb.us-west-2.amazonaws.com.
Name: a89b3c627d423471389d6ada5c3311b4-f09b129745548506.elb.us-west-2.amazonaws.com
Address: 44.234.192.47
Example DNS mapping for a workspace with front-end VPC endpoint in AWS region us-east-1
:
By default the DNS mapping is:
myworkspace.cloud.databricks.com
maps tonvirginia.privatelink.cloud.databricks.com
. In this casenvirginia
is the control plane instance short name in that region.nvirginia.privatelink.cloud.databricks.com
maps tonvirginia.cloud.databricks.com
.nvirginia.cloud.databricks.com
maps to the AWS public IPs.
After your DNS changes, from your transit VPC (where your front-end VPC endpoint is), the DNS mapping would be:
myworkspace.cloud.databricks.com
maps tonvirginia.privatelink.cloud.databricks.com
.nvirginia.privatelink.cloud.databricks.com
maps to the VPC endpoint private IP address.
For the workspace URL to map to the VPC endpoint private IP from the on-premises network, you must do one of the following:
Configure conditional forwarding for the workspace URL to use AmazonDNS.
Create an A-record for the workspace URL in your on-premises or internal DNS that maps to the VPC endpoint private IP.
Complete steps similar to what you would do to enable access to other similar PrivateLink-enabled services.
You can choose to map the workspace URL directly to the front-end (workspace) VPC endpoint private IP by creating an A-record in your internal DNS, such that the DNS mapping looks like this:
myworkspace.cloud.databricks.com
maps to the VPC endpoint private IP
After you make changes to your internal DNS configuration, test the configuration by accessing the Databricks workspace web application and REST API from your transit VPC. Create a VPC in the transit VPC if necessary to test the configuration.
If you have questions about how this applies to your network architecture, contact your Databricks representative.
Step 5: Add VPC endpoints for other AWS services (recommended but optional)
If you are using secure cluster connectivity, which is required to implement a PrivateLink back-end connection, you can optionally add other VPC endpoints to your data plane VPC so your clusters can connect to AWS native services that Databricks uses, such as S3, STS, Kinesis, and other resources that your workspace accesses.
S3 VPC gateway endpoint: Attach this only to the route table that’s attached to your workspace subnets. If you’re using the recommended separate subnet with its own route table for back-end VPC endpoints, then the S3 VPC endpoint doesn’t need to be attached to that particular route table.
STS VPC interface endpoint: Create this in all the workspace subnets and attach it to the workspace security group. Do not create this in the subnet for back-end VPC endpoints.
Kinesis VPC interface endpoint: Just like the STS VPC interface endpoint, create the Kinesis VPC interface endpoint in all the workspace subnets and attach them to the workspace security group.
If you want to lock down a workspace VPC so that no other outbound connections are supported, the workspace won’t have access to the Databricks-provided metastore (RDS-based Hive Metastore) because AWS does not yet support PrivateLink for JDBC traffic to RDS. One option is that you could configure the regional Databricks-provided metastore FQDN or IP in an egress firewall, or a public route table to Internet Gateway, or a Network ACL for the public subnet hosting a NAT Gateway. In such a case, the traffic to the Databricks-provided metastore would go over the public network. However, if you do not want to access the Databricks-managed metastore over the public network:
You could deploy an external metastore in your own VPC. See External Apache Hive metastore (legacy).
You could use AWS Glue for your metastore. Glue supports PrivateLink. See Use AWS Glue Data Catalog as a metastore (legacy).
You may also want to consider any need for access to public library repositories like pypi (for python) or CRAN (for R). To access those, either reconsider deploying in a fully-locked-down outbound mode, or instead use an egress firewall in your architecture to configure the required repositories. The overall architecture of your deployment depends on your overall requirements. If you have questions, contact your Databricks representative.
To create the AWS VPC endpoints using the AWS Management Console, see the AWS article for creating VPC endpoints in the AWS Management Console.
For tools that can help automate VPC endpoint creation and management, see:
The article Databricks Terraform provider
The Terraform resources databricks_mws_vpc_endpoint and databricks_mws_private_access_settings.
The Terraform guide Deploying prerequisite resources and enabling PrivateLink connections.
The AWS article CloudFormation: Creating VPC Endpoint
The AWS article AWS CLI : create-vpc-endpoint.