Enable Private Service Connect for your workspace
This page provides an overview of Private Service Connect in Databricks and includes configuration steps to enable back-end private connectivity.
- To enable front-end private connectivity to Databricks, see Configure private connectivity to Databricks.
- To use the REST API, see the Private Access Settings API reference.
You must contact your Databricks account team to request access to enable Private Service Connect on your workspace. Databricks support for private connectivity using Private Service Connect is generally available.
This feature requires the Premium plan.
Private connectivity overview
Private Service Connect enables secure, private connectivity between your Databricks workspace and users or compute resources, ensuring that traffic never traverses the public internet. This feature is designed to help organizations meet security and compliance requirements by providing end-to-end private networking and reducing the risk of data exfiltration.
You can implement either front-end or back-end Private Service Connect independently, depending on your security needs. However, for complete network isolation, where Databricks automatically rejects all public network connections, both front-end and back-end, you must enable Private Service Connect. This comprehensive approach creates an end-to-end private networking solution, reducing your attack surface and supporting compliance for sensitive workloads.
With Private Service Connect you can:
- Prevent data access from unauthorized networks or the public internet using the Databricks web application or API.
- Substantially lower the risk of data exfiltration by limiting network exposure to approved private endpoints.
In order to deploy Private Service Connect, you must:
- Create a new Databricks workspace that uses a customer-managed VPC. You cannot add Private Service Connect connectivity to an existing workspace or to a workspace that uses a Databricks-managed VPC.
- Create specific Databricks configuration objects and updates to existing configurations to define private access settings and permitted VPC endpoints. See Step 3: Create VPC endpoints.
Terminology
The following Google Cloud terms are used in this guide to describe Databricks configuration:
Google and Databricks terminology | Description |
---|---|
Google Cloud: Private Service Connect (PSC) | A Google Cloud feature that provides private connectivity between VPC networks and Google Cloud services. Enables consumers to access managed services privately from inside their VPC network. Also used by Databricks for private connections between VPCs and Databricks services. |
Google Cloud: Host project | In Shared VPC setups, this is the project where the VPCs are created, used for both classic compute plane VPC (back-end Private Service Connect) and transit VPC (front-end Private Service Connect). |
Google Cloud: Service project | In Shared VPC setups, this is the project that contains the workspace compute resources. |
Google Cloud: Private Service Connect endpoint or VPC endpoint | A private connection from a VPC network to services, such as those published by Databricks. In GCP, endpoints are internal IP addresses in a consumer VPC network that can be directly accessed by clients in that network. |
Databricks: Client | A user on a browser accessing the Databricks UI or an application client accessing the Databricks APIs. |
Databricks: Transit VPC | The VPC network hosting clients that access the Databricks workspace WebApp or APIs. In Google terms, this is often the VPC used for front-end PSC endpoints. |
Databricks: Front-end (User to Workspace) Private Service Connect endpoints | The Private Service Connect endpoint configured on the transit VPC network that allows clients to privately connect to the Databricks web application and APIs. |
Databricks: Back-end (classic compute plane to control plane) Private Service Connect endpoints | Private Service Connect endpoints on the customer-managed VPC network allowing private communication between the classic compute plane and the Databricks control plane. |
Databricks: Classic compute plane VPC | The VPC network that hosts the compute resources of your Databricks workspace, configured in your GCP organization. In Google terms, this is the VPC for back-end Private Service Connect. |
Databricks: Private workspace | A workspace where the classic compute plane VMs have no public IP addresses. Endpoints on the Databricks control plane can only be accessed privately from authorized VPC networks or IPs. On GCP, Databricks creates clusters without public IPs by default. |
Enable back-end Private Service Connect for your workspace
Back-end Private Service Connect connects Databricks classic compute resources in a customer VPC to workspace core services. Clusters connect to the control plane for Databricks REST APIs and secure cluster connectivity relay. The following guide includes configuration steps you can perform using the Databricks account console or the API.
Requirements and limitations
The following requirements and limitations apply:
- New workspaces only: You can create a new workspace with Private Service Connect connectivity. You can't add Private Service Connect connectivity to an existing workspace.
- Customer-managed VPC is required: You must use a customer-managed VPC. You can create your VPC in the Google Cloud console. Next, in the Databricks account console or the API, you create a network configuration that references your VPC and sets additional fields that are specific to Private Service Connect.
- Enable your account: Databricks must enable your account for the feature. To enable Private Service Connect on one or more workspaces, contact your Databricks account team and request to enable it on your account. Provide the Google Cloud region and your host project ID to reserve quota for Private Service Connect connections. After your account is enabled for Private Service Connect, use the Databricks account console or the API to configure your Private Service Connect objects and create new workspaces.
- Quotas: You can configure up to two Private Service Connect endpoints per region per VPC host project for Databricks. Multiple Databricksworkspaces in the same VPC and region must share these endpoints because Private Service Connect endpoints are region-specific resources. If this quota presents a limitation for your setup, contact your account team.
- No cross-region connectivity: Private Service Connect workspace components must be in the same region including:
- Transit VPC network and subnets
- Compute plane VPC network and subnets
- Databricks workspace
- Private Service Connect endpoints
- Private Service Connect endpoint subnets
- Sample datasets. Sample Unity Catalog datasets and Databricks datasets become available when you use serverless compute or when a cloud NAT/proxy is configured for the VPC to provide internet access. See Sample datasets.
Multiple options for network topology
You can deploy a private Databricks workspace with the following network configuration options:
- Host Databricks users (clients) and the Databricks classic compute plane on the same network: In this option, the transit VPC and compute plane VPC refer to the same underlying VPC network. If you choose this topology, all access to any Databricks workspace from that VPC must go over the front-end Private Service Connect connection for that VPC. See Requirements and limitations.
- Host Databricks users (clients) and the Databricks classic compute plane on separate networks: In this option, the user or application client can access different Databricks workspaces using different network paths. You can optionally allow a user on the transit VPC to access a private workspace over a Private Service Connect connection and also allow users on the public internet to the workspace.
- Host compute plane for multiple Databricks workspaces on the same network: In this option, the compute plane VPC for multiple Databricks workspaces refer to the same underlying VPC network. All such workspaces must share the same back-end Private Service Connect endpoint. This deployment pattern can allow you to configure a smaller number of Private Service Connect endpoints while configuring a large number of workspaces.
You can share one transit VPC for multiple workspaces. However, each transit VPC must contain only workspaces that use front-end Private Service Connect, or only workspaces that do not use front-end Private Service Connect. Due to the way DNS resolution works on Google Cloud, you can't use both types of workspaces with a single transit VPC.
Reference architecture
A Databricks workspace deployment includes the following network paths that you can secure:
- Databricks client on your transit VPC to the Databricks control plane. This includes both the web application and REST API access.
- Databricks classic compute plane VPC network to the Databricks control plane service. This includes the secure cluster connectivity relay and the workspace connection for the REST API endpoints.
- Databricks classic compute plane to storage in a Databricks-managed project.
- Databricks control plane to storage in your projects including the DBFS bucket.
It’s possible to have a no-firewall architecture to restrict outbound traffic, ideally using an external metastore. Outbound traffic to a public library repository is not possible by default, but you can bring your own locally mirrored package repo.
You can also use a firewall architecture and allow egress to public package repos and the optional Databricks-managed metastore.
Regional service attachments reference
To enable Private Service Connect, you need the service attachment URIs for the following endpoints for your region:
- The workspace endpoint. This ends with the suffix
plproxy-psc-endpoint-all-ports
. This has a dual role. This is used by by back-end Private Service Connect to connect to the control plane for REST APIs. This is also used by front-end Private Service Connect to connect your transit VPC to the workspace web application and REST APIs. - The secure cluster connectivity (SCC) relay endpoint. This ends with the suffix
ngrok-psc-endpoint
. This is used only for back-end Private Service Connect. It connects to the control plane for the secure cluster connectivity (SCC) relay.
To get the workspace endpoint and SCC relay endpoint service attachment URIs for your region, see Private Service Connect (PSC) attachment URIs and project numbers.
Step 1: Enable your account for Private Service Connect
Before Databricks can accept Private Service Connect connections from your Google Cloud projects, you must contact your Databricks account team and provide the following information for each workspace where you want to enable Private Service Connect:
-
Databricks account ID
- As an account admin, log in to the Databricks account console.
- Click the down arrow next to your username in the upper right corner
- In the menu, click the copy icon to copy the account ID.
-
VPC Host Project ID of the compute plane VPC, if you are enabling back-end Private Service Connect
-
VPC Host Project ID of the transit VPC, if you are enabling front-end Private Service Connect
-
Region of the workspace
A Databricks representative responds with a confirmation once Databricks is configured to accept Private Service Connect connections from your Google Cloud projects. This can take up to three business days.
Step 2: Create a subnet
In the classic compute plane VPC network, create a subnet specifically for Private Service Connect endpoints. The following instructions assume use of the Google Cloud console, but you can also use the gcloud
CLI to perform similar tasks.
To create a subnet:
-
In the Google Cloud cloud console, go to the VPC list page.
-
Click Add subnet.
-
Set the name, description, and region.
-
If the Purpose field is visible (it might not be visible), choose None:
-
Set a private IP range for the subnet, such as
10.0.0.0/24
.importantYour IP ranges cannot overlap for any of the following:
- Subnet of BYO VPC.
- Subnet that holds the Private Service Connect endpoints.
-
Confirm that your subnet was added to the VPC view in the Google Cloud console for your VPC:
Step 3: Create VPC endpoints
You must create a VPC endpoint that connects to Databricks service attachments. Service attachments URIs vary by workspace region. The following instructions assume use of the Google Cloud console, but you can also use the gcloud
CLI to perform similar tasks. For instructions on creating VPC endpoints to the service attachments by using the gcloud
CLI or API, see this Google Cloud Platform article.
On the subnet that you created, create VPC endpoints to the following service attachments from your classic compute plane VPC:
- The workspace endpoint. This ends with the suffix
plproxy-psc-endpoint-all-ports
. - The secure cluster connectivity relay endpoint. This ends with the suffix
ngrok-psc-endpoint
To create a VPC endpoint in Google Cloud console:
-
Go to Private Service Connect.
-
Click the CONNECTED ENDPOINTS tab.
-
Click + Connect endpoint.
-
For Target, select Published service.
-
For Target service, enter the service attachment URI.
importantSee the table in Regional service attachments reference to get the two Databricks service attachment URIs for your workspace region.
-
For the endpoint name, enter a name to use for the endpoint.
-
Select a VPC network for the endpoint.
-
Select a subnet for the endpoint. Specify the subnet that you created for Private Service Connect endpoints.
-
Select an IP address for the endpoint. If you need a new IP address:
- Click the IP address drop-down menu and select Create IP address.
- Enter a name and optional description.
- For a static IP address, select Assign automatically or Let me choose.
- If you selected Let me choose, enter the custom IP address.
- Click Reserve.
-
Select a namespace from the drop-down list or create a new namespace. The region is populated based on the selected subnetwork.
-
Click Add endpoint.
Step 4: Register your VPC endpoints
Register your Google Cloud endpoints using the Databricks account console. You can also use the VPC Endpoint Configurations API.
-
Go to the Databricks account console.
-
Click the Cloud resources tab, then VPC endpoints.
-
Click Register VPC endpoint.
-
For each of your Private Service Connect endpoints, fill in the required fields to register a new VPC endpoint:
- VPC endpoint name: A human-readable name to identify the VPC endpoint. Databricks recommends using the same name as your Private Service Connect endpoint ID, but it is not required that these match.
- Region: The Google Cloud region where this Private Service Connect endpoint is defined.
- Google Cloud VPC network project ID: The Google Cloud project ID where this endpoint is defined. For back-end connectivity, this is the project ID for your workspace’s VPC network. For front-end connectivity, this is the project ID of the VPC where user connections originate, which is sometimes referred to as a transit VPC.
The following table shows the information you must use for each endpoint if you are using both back-end and front-end Private Service Connect.
Endpoint type | Field | Example |
---|---|---|
Back-end classic compute plane VPC REST/workspace endpoint ( | VPC endpoint name (Databricks recommends matching the Google Cloud endpoint ID) |
|
Google Cloud VPC network project ID |
| |
Google Cloud Region |
| |
Back-end classic compute plane VPC SCC relay endpoint ( | VPC endpoint name (Databricks recommends matching the Google Cloud endpoint ID) |
|
Google Cloud VPC network project ID |
| |
Google Cloud Region |
|
When you are done, you can use the VPC endpoints list in the account console to review the list of endpoints and confirm the information. It would look generally like this:
Step 5: Create a Databricks private access settings object
Create a private access settings object that defines several Private Service Connect settings for your workspace. This object will be attached to your workspace. One private access settings object can be attached to multiple workspaces within the same region. To create a private access settings objects, see Create a private access settings object.
Step 6: Create a network configuration
Create a Databricks network configuration, which encapsulates information about your customer-managed VPC for your workspace. This object will be attached to your workspace. You can also use the Network configurations API.
- Go to the Databricks account console.
- Click the Cloud resources tab, then Network configurations.
- Click Add Network configuration.
The following table shows the information you need to use for each endpoint.
Field | Example value |
---|---|
Network configuration name |
|
Network GCP project ID |
|
VPC Name |
|
Subnet Name |
|
Region of the subnet |
|
VPC endpoint for secure cluster connectivity relay |
|
VPC endpoint for REST APIs (back-end connection to workspace) |
|
Step 7: Create a workspace
Create a workspace using the network configuration that you created using the account console. You can also use the Workspaces API.
- Go to the Databricks account console.
- Click the Workspaces tab.
- Click Create workspace.
- Set these standard workspace fields:
- Workspace name
- Region
- Google cloud project ID (the project for the workspace’s compute resources, which may be different than the project ID for your VPC).
- Set Private Service Connect specific fields:
- Click Advanced configurations.
- In the Network configuration field, choose your network configuration that you created in previous steps.
- In the Private connectivity field, choose your private access settings object that you created in previous steps. Note that one private access settings object can be attached to multiple workspaces.
- Click Save.
Step 8: Validate the workspace configuration
After you create the workspace, go back to the workspace page and find your newly created workspace. It typically takes between 30 seconds and 3 minutes for the workspace to transit from PROVISIONING
status to RUNNING
status. After the status changes to RUNNING
, your workspace is configured successfully.
You can validate the configuration using the Databricks account console:
- Click Cloud resources and then Network configurations. Find the network configuration for your VPC using the account console. Review it to confirm all fields are correct.
- Click Workspaces and find the workspace. Confirm that the workspace is running:
If you want to review the set of workspaces using the API, make a GET
request to the https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces
endpoint.
Step 9: Configure DNS
This section shows how to create a private DNS zone that includes the classic compute plane VPC network. You must create DNS records that map the workspace URL to the plproxy-psc-endpoint-all-ports
Private Service Connect endpoint IP:
- Ensure that you have the workspace URL for your deployed Databricks workspace. The URL has a format similar to
https://33333333333333.3.gcp.databricks.com
. You can get this URL from the web browser when you are viewing a workspace or from the account console in its list of workspaces. - Locate the Private Service Connect endpoint IP for the
plproxy-psc-endpoint-all-ports
Private Service Connect endpoint. In this example, the IP for the Private Service Connect endpointpsc-demo-dp-rest-api
is10.10.0.2
. - Create the following
A
record mappings:- Your workspace domain (such as
33333333333333.3.gcp.databricks.com
) to10.10.0.2
- Your workspace domain with prefix
dp-
, such asdp-33333333333333.3.gcp.databricks.com
) to10.10.0.2
- Your workspace domain (such as
- In the same zone of
gcp.databricks.com
, create a private DNS record to map the SCC relay URL to the SCC relay endpointngrok-psc-endpoint
using its endpoint IP.- The SCC relay URL is of the format:
tunnel.<workspace-gcp-region>.gcp.databricks.com
. In this example, the SCC relay URL istunnel.us-east4.gcp.databricks.com
. - Locate the Private Service Connect endpoint IP for the
ngrok-psc-endpoint
Private Service Connect endpoint. In this example, the IP for the Private Service Connect endpointpsc-demo-dp-ngrok
is10.10.0.3
. - Create an
A
record to maptunnel.us-east4.gcp.databricks.com
to10.10.0.3
.
- The SCC relay URL is of the format:
Validate your DNS configuration
In your VPC networks, check your DNS configurations:
In your classic compute plane VPC network, use the nslookup
tool to confirm that the following URLs resolve to the correct Private Service Connect endpoint IP
<workspace-url>
maps to the Private Service Connect endpoint IP for the endpoint withplproxy-psc-endpoint-all-ports
in its name.dp-<workspace-url>
maps to the Private Service Connect endpoint IP for the endpoint withplproxy-psc-endpoint-all-ports
in its name.tunnel.<workspace-gcp-region>.gcp.databricks.com
maps to the Private Service Connect endpoint IP for the endpoint withngrok-psc-endpoint
in its name.
Intermediate DNS name for Private Service Connect
The intermediate DNS name for workspaces that enable either back-end or front-end Private Service Connect is <workspace-gcp-region>.psc.gcp.databricks.com
. This allows you to separate traffic for the workspaces that they must access from other Databricks services that don’t support Private Service Connect, such as the account console.
Step 10 (optional): Configure metastore access
Features such as SQL access control lists (ACLs) require access to the legacy workspace-level Hive metastore. Since the compute plane VPC can't access the public internet by default, you must create a Cloud NAT with access to the metastore. See Control plane service endpoint IP addresses by region.
You can additionally configure a firewall to prevent ingress and egress traffic from all other sources. Alternatively, if you do not want to configure a Cloud NAT for their VPC, another option is to configure a private connection to an external metastore.
Step 11 (optional): Configure IP access lists
Front-end connections to re[PSC] workspaces allow public access by default. You can control public access by creating a private access settings object.
Follow these steps to manage public access:
- Decide on public access:
- Deny public access: No public connections to the workspace will be allowed.
- Allow public access: You can further restrict access using IP access lists.
- If you allow public access, configure IP Access Lists:
- Set up IP access lists to control which public IP addresses can access your Databricks workspace.
- IP access lists only affect requests from public IP addresses over the internet. They do not block private traffic from Private Service Connect.
- To block all internet access:
- Enable IP access lists for your workspace. See Configure IP access lists for workspaces
- Create an IP access list rule:
BLOCK 0.0.0.0/0
.
IP access lists do not affect requests from VPC networks connected through Private Service Connect. These connections are managed using the Private Service Connect access level configuration. See Step 5: Create a Databricks private access settings object.
Step 12 (optional): Configure VPC Service Controls
In addition to using Private Service Connect to privately connect to the Databricks service, you can configure VPC Service Controls to keep your traffic private and mitigate data exfiltration risks.
Configure back-end private access from the compute plane VPC to Cloud Storage
You can configure Private Google Access or Private Service Connect to privately access cloud storage resources from your compute plane VPC.
Add your compute plane projects to a VPC Service Controls Service Perimeter
For each Databricks workspace, you can add the following Google Cloud projects to a VPC Service Controls service perimeter:
- Compute plane VPC host project
- Project containing the workspace storage bucket
- Service projects containing the compute resources of the workspace
With this configuration, you must grant access to both of the following:
- The compute resources and workspace storage bucket from the Databricks control plane
- Databricks-managed storage buckets from the compute plane VPC
Ingress rule
You must add an ingress rule to grant access to your VPC Service Controls Service Perimeter from the Databricks control plane VPC.
The following is an example ingress rule:
From:
Identities: ANY_IDENTITY
Source > Projects =
<regional-control-plane-vpc-host-project-number-1>
<regional-control-plane-vpc-host-project-number-2>
<regional-control-plane-uc-project-number>
<regional-control-plane-audit-log-delivery-project-number>
To:
Projects =
<list of compute plane Project Ids>
Services =
Service name: storage.googleapis.com
Service methods: All actions
Service name: compute.googleapis.com
Service methods: All actions
Service name: container.googleapis.com
Service methods: All actions
Service name: logging.googleapis.com
Service methods: All actions
Service name: cloudresourcemanager.googleapis.com
Service methods: All actions
Service name: iam.googleapis.com
Service methods: All actions
To get the project numbers for your ingress rules, see Private Service Connect (PSC) attachment URIs and project numbers.
Egress rule
You must add an egress rule to grant access to Databricks-managed storage buckets from the compute plane VPC. The following is an example egress rule:
From:
Identities: ANY_IDENTITY
To:
Projects =
<regional-control-plane-asset-project-number>
<regional-control-plane-vpc-host-project-number-1>
<regional-control-plane-vpc-host-project-number-2>
Services =
Service name: storage.googleapis.com
Service methods: All actions
Service name: artifactregistry.googleapis.com
Service methods:
artifactregistry.googleapis.com/DockerRead'
To get the project numbers for your egress rules, see Private Service Connect (PSC) attachment URIs and project numbers.
Access data lake storage buckets secured by VPC Service Controls
You can add the Google Cloud projects containing the data lake storage buckets to a VPC Service Controls Service Perimeter.
If the data lake storage buckets and the Databricks workspace projects are in the same VPC Service Controls Service Perimeter, you do not need any additional ingress or egress rules.
If the data lake storage buckets are in a separate VPC Service Controls Service Perimeter, you must configure the following:
- Ingress rules on data lake Service Perimeter:
- Allow access to Cloud Storage from the Databricks compute plane VPC
- Allow access to Cloud Storage from the Databricks control plane VPC using the project IDs documented on the regions page. This access will be required as Databricks introduces new data governance features such as Unity Catalog.
- Egress rules on Databricks compute plane Service Perimeter:
- Allow egress to Cloud Storage on data lake Projects