Configure private connectivity to AWS S3 storage buckets
This feature is in Public Preview. To join this preview, contact your Databricks account team.
Effective October 7, 2024, Databricks began charging customers for networking costs incurred from serverless compute resources connecting to external resources. Serverless network billing is rolling out in phases, which might result in gradual billing changes. For more information on billing, see Understand Databricks serverless networking costs.
This page explains how to configure private connectivity from serverless compute to your in-region AWS S3 buckets using the Databricks account console UI.
Configuring private connectivity for serverless compute provides:
- A dedicated and private connection: Ensures secure and isolated access between your serverless workspaces and AWS S3, limiting access to authorized connections only.
- Enhanced data exfiltration mitigation: While serverless compute with Unity Catalog provides built-in data exfiltration protection, PrivateLink adds an extra layer of network defense. Using AWS PrivateLink, your data traffic remains entirely in the AWS network, never traversing the public internet. This architecture, combined with controlled access through VPC endpoints, reduces the attack surface for data exfiltration.
Requirements
- The workspace is on the Enterprise plan.
- You are the account admin of your Databricks account.
- You have at least one functional workspace using serverless compute.
- You have appropriate AWS IAM permissions to create and modify S3 bucket policies and create VPC endpoints.
- Each Databricks account can have up to 10 NCCs per region.
- Each region can have 30 private endpoints, distributed as needed across 1-10 NCCs.
- Each NCC can be attached to up to 50 workspaces.
- Each NCC can have one AWS S3 private endpoint rule.
- Each private endpoint rule can include up to 100 S3 buckets.
Step 1: Create a Network Connectivity Configuration (NCC) object
You can skip this step if you have an existing NCC in the same region and AWS account that you want to use.
- In the account console, click Cloud resources.
- Select the Network tab.
- Select Add Network Connectivity Configuration.
- Type a name for the NCC.
- Choose the region. This must match your workspace region.
- Click Add.
Step 2: Create an AWS S3 interface endpoint
Do not enable your private endpoint until you have completed Step 3.
- Navigate to the Private endpoint rules section in your NCC.
- Select Add private endpoint rule.
- Select S3 bucket under Resource type.
- Configure the rule settings:
- Endpoint Service: This field is automatically populated to establish the connection to your private endpoint's destination resources.
- S3 bucket names: Enter bucket names for your destination resources. The bucket must exist in the same AWS region as the NCC and endpoint service.
Step 3: Update your S3 bucket policy accordingly to accept traffic from the VPC endpoint
To allow serverless compute to access your S3 bucket through the private endpoint, you might need to update your S3 bucket policy in your AWS account.
An example Allow
clause you might need to add is shown:
{
"Sid": "AllowVpcEndpointAccess",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::{bucket-name}", "arn:aws:s3:::{bucket-name}/*"],
"Condition": {
"StringEquals": {
"aws:SourceVpce": "vpce-12345" // This is the VPC endpoint returned in Step 2
}
}
}
If your bucket policy is configured with a Deny
clause instead, you might need to add a exception condition for the VPC endpoint ID returned in Step 2.
An example condition is as shown:
{
...
"Effect": "Deny",
...
"Condition": {
"StringNotEquals": {
"aws:SourceVpce": "vpce-12345"
}
}
}
This example policy does not include other public or private endpoints that you might want to allowlist, like corporate VPN IPs.
Refresh the UI or make an API call to confirm the rule's status changes to ESTABLISHED
.
Step 4: Enable private endpoint rule
- Click the kebab menu icon.
- Click Update rule.
- Select Enable rule.
This step routes traffic for all S3 buckets configured in the private endpoint rule through PrivateLink for any workspace attached to the NCC. Before proceeding, ensure you have completed Step 3 to allow S3 bucket access from the VPC endpoint.
Step 5: Attach the NCC to one or more workspaces
This step associates your configured private connectivity with your serverless workspaces. Skip this step if your workspace is already attached to the desired NCC. To attach the NCC to a workspace:
- Navigate to Workspaces in the left-hand navigation.
- Select an existing workspace.
- Select Update Workspace.
- Under Network Connectivity Configuration, select the dropdown and choose the NCC you’ve created.
- Repeat for all workspaces you’d like this NCC to apply to.
Step 6: Verify connectivity
To test connectivity, register the bucket as an external location.
- Register your bucket as an external location. See external locations.
- Open the SQL editor
- Run:
CREATE TABLE {catalog}.{schema}.test_connectivity LOCATION 's3://{your-s3-bucket}/test_connectivity'
It can take ten minutes for the connection to fully establish.
If your network policy restricts external access, direct connections to your AWS S3 bucket’s FQDNs like {your-s3-bucket}.s3.{region}.amazonaws.com
will be blocked. You must explicitly add the required FQDNs to your network policy’s Allowed domains to allow this access. See Manage network policies for serverless egress control.
Access to your S3 buckets must use regional endpoints like {your-s3-bucket}.s3.{region}.amazonaws.com
. Legacy endpoints like {your-s3-bucket}.s3.amazonaws.com
are not supported.
What's next
- Configure private connectivity to AWS resources: Use PrivateLink to establish secure and isolated access to AWS services from your virtual network, bypassing the public internet. See Configure private connectivity to resources in your VPC.
- Configure a firewall for serverless compute access: Implement a firewall to restrict and secure inbound and outbound network connections for your serverless compute environments. See Configure a firewall for serverless compute access.
- Understand data transfer and connectivity costs: Data transfer and connectivity refer to moving data into and out of serverless environments. Networking charges for serverless products only apply to customers using serverless compute. See Understand Databricks serverless networking costs.