Skip to main content

Configure private connectivity to AWS S3 storage buckets

This feature is in Public Preview. To join this preview, contact your Databricks account team.

note

Effective October 7, 2024, Databricks began charging customers for networking costs incurred from serverless compute resources connecting to external resources. Serverless network billing is rolling out in phases, which might result in gradual billing changes. For more information on billing, see Understand Databricks serverless networking costs.

This page explains how to configure private connectivity from serverless compute to your in-region AWS S3 buckets using the Databricks account console UI.

Private connectivity to AWS S3 storage buckets.

Configuring private connectivity for serverless compute provides:

  • A dedicated and private connection: Ensures secure and isolated access between your serverless workspaces and AWS S3, limiting access to authorized connections only.
  • Enhanced data exfiltration mitigation: While serverless compute with Unity Catalog provides built-in data exfiltration protection, PrivateLink adds an extra layer of network defense. Using AWS PrivateLink, your data traffic remains entirely in the AWS network, never traversing the public internet. This architecture, combined with controlled access through VPC endpoints, reduces the attack surface for data exfiltration.

Requirements

  • The workspace is on the Enterprise plan.
  • You are the account admin of your Databricks account.
  • You have at least one functional workspace using serverless compute.
  • You have appropriate AWS IAM permissions to create and modify S3 bucket policies and create VPC endpoints.
  • Each Databricks account can have up to 10 NCCs per region.
  • Each region can have 30 private endpoints, distributed as needed across 1-10 NCCs.
  • Each NCC can be attached to up to 50 workspaces.
  • Each NCC can have one AWS S3 private endpoint rule.
  • Each private endpoint rule can include up to 100 S3 buckets.

Step 1: Create a Network Connectivity Configuration (NCC) object

You can skip this step if you have an existing NCC in the same region and AWS account that you want to use.

  1. In the account console, click Cloud resources.
  2. Select the Network tab.
  3. Select Add Network Connectivity Configuration.
  4. Type a name for the NCC.
  5. Choose the region. This must match your workspace region.
  6. Click Add.

Step 2: Create an AWS S3 interface endpoint

important

Do not enable your private endpoint until you have completed Step 3.

  1. Navigate to the Private endpoint rules section in your NCC.
  2. Select Add private endpoint rule.
  3. Select S3 bucket under Resource type.
  4. Configure the rule settings:
    • Endpoint Service: This field is automatically populated to establish the connection to your private endpoint's destination resources.
    • S3 bucket names: Enter bucket names for your destination resources. The bucket must exist in the same AWS region as the NCC and endpoint service.

Step 3: Update your S3 bucket policy accordingly to accept traffic from the VPC endpoint

To allow serverless compute to access your S3 bucket through the private endpoint, you might need to update your S3 bucket policy in your AWS account.

An example Allow clause you might need to add is shown:

JSON
{
"Sid": "AllowVpcEndpointAccess",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::{bucket-name}", "arn:aws:s3:::{bucket-name}/*"],
"Condition": {
"StringEquals": {
"aws:SourceVpce": "vpce-12345" // This is the VPC endpoint returned in Step 2
}
}
}

If your bucket policy is configured with a Deny clause instead, you might need to add a exception condition for the VPC endpoint ID returned in Step 2. An example condition is as shown:

JSON
  {
...
"Effect": "Deny",
...
"Condition": {
"StringNotEquals": {
"aws:SourceVpce": "vpce-12345"
}
}
}
note

This example policy does not include other public or private endpoints that you might want to allowlist, like corporate VPN IPs.

Refresh the UI or make an API call to confirm the rule's status changes to ESTABLISHED.

Step 4: Enable private endpoint rule

  1. Click the kebab menu icon.
  2. Click Update rule.
  3. Select Enable rule.
important

This step routes traffic for all S3 buckets configured in the private endpoint rule through PrivateLink for any workspace attached to the NCC. Before proceeding, ensure you have completed Step 3 to allow S3 bucket access from the VPC endpoint.

Step 5: Attach the NCC to one or more workspaces

This step associates your configured private connectivity with your serverless workspaces. Skip this step if your workspace is already attached to the desired NCC. To attach the NCC to a workspace:

  1. Navigate to Workspaces in the left-hand navigation.
  2. Select an existing workspace.
  3. Select Update Workspace.
  4. Under Network Connectivity Configuration, select the dropdown and choose the NCC you’ve created.
  5. Repeat for all workspaces you’d like this NCC to apply to.

Step 6: Verify connectivity

To test connectivity, register the bucket as an external location.

  1. Register your bucket as an external location. See external locations.
  2. Open the SQL editor
  3. Run:
CREATE TABLE {catalog}.{schema}.test_connectivity LOCATION 's3://{your-s3-bucket}/test_connectivity'

It can take ten minutes for the connection to fully establish.

note

If your network policy restricts external access, direct connections to your AWS S3 bucket’s FQDNs like {your-s3-bucket}.s3.{region}.amazonaws.com will be blocked. You must explicitly add the required FQDNs to your network policy’s Allowed domains to allow this access. See Manage network policies for serverless egress control.

Access to your S3 buckets must use regional endpoints like {your-s3-bucket}.s3.{region}.amazonaws.com. Legacy endpoints like {your-s3-bucket}.s3.amazonaws.com are not supported.

What's next