Customer-managed VPC

Important

This feature requires that your account is on the E2 version of the Databricks platform. All new Databricks accounts and most existing accounts are now E2. If you are unsure which account type you have, contact your Databricks representative.

Important

This article mentions the term data plane, which is the compute layer of the Databricks platform. In the context of this article, data plane refers to the Classic data plane in your AWS account. By contrast, the Serverless data plane that supports Serverless SQL warehouses (Public Preview) runs in the Databricks AWS account. To learn more, see Serverless compute.

Overview

By default, clusters are created in a single AWS VPC (Virtual Private Cloud) that Databricks creates and configures in your AWS account. You can optionally create your Databricks workspaces in your own VPC, a feature known as customer-managed VPC. You can use a customer-managed VPC to exercise more control over your network configurations to comply with specific cloud security and governance standards your organization may require.

Important

To configure your workspace to use AWS PrivateLink (Public Preview) for any type of connection, it is required that your workspace use a customer-managed VPC.

A customer-managed VPC is good solution if you have:

  • Security policies that prevent PaaS providers from creating VPCs in your own AWS account.

  • An approval process to create a new VPC, in which the VPC is configured and secured in a well-documented way by internal information security or cloud engineering teams.

Benefits include:

  • Lower privilege level: You maintain more control of your own AWS account. And you don’t need to grant Databricks as many permissions via cross-account IAM role as you do for a Databricks-managed VPC. For example, there is no need for permission to create VPCs. This limited set of permissions can make it easier to get approval to use Databricks in your platform stack.

  • Simplified network operations: Better network space utilization. Optionally configure smaller subnets for a workspace, compared to the default CIDR /16. And there is no need for the complex VPC peering configurations that might be necessary with other solutions.

  • Consolidation of VPCs: Multiple Databricks workspaces can share a single data plane VPC, which is often preferred for billing and instance management.

  • Limit outgoing connections: By default, the data plane does not limit outgoing connections from Databricks Runtime workers. For workspaces that are configured to use a customer-managed VPC, you can use an egress firewall or proxy appliance to limit outbound traffic to a list of allowed internal or external data sources.

Customer-managed VPC

To take advantage of a customer-managed VPC, you must specify a VPC when you first create the Databricks workspace. You cannot move an existing workspace with a Databricks-managed VPC to use a customer-managed VPC. You can, however, move an existing workspace with a customer-managed VPC from one VPC to another VPC by updating the workspace configuration’s network configuration object. See Update a running workspace.

To deploy a workspace in your own VPC, you must:

  1. Create the VPC following the requirements enumerated in VPC requirements.

  2. Reference your VPC network configuration with Databricks when you create the workspace.

    You must provide the VPC ID, subnet IDs, and security group ID when you register the VPC with Databricks.

VPC requirements

Your VPC must meet the requirements described in this section in order to host a Databricks workspace.

VPC region

Workspace data plane VPCs can be in AWS regions ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-west-1, eu-west-2, eu-central-1, us-east-1, us-east-2, us-west-1, and us-west-2. However, you cannot use a VPC in us-west-1 if you want to use customer-managed keys for encryption.

VPC sizing

You can share one VPC with multiple workspaces in a single AWS account. However, you cannot reuse subnets or security groups between workspaces. Be sure to size your VPC and subnets accordingly. Databricks assigns two IP addresses per node, one for management traffic and one for Apache Spark applications. The total number of instances for each subnet is equal to half the number of IP addresses that are available. Learn more in Subnets.

VPC IP address ranges

Databricks doesn’t limit netmasks for the workspace VPC, but each workspace subnet must have a netmask between /17 and /26. This means that if your workspace has two subnets and both have a netmask of /26, then the netmask for your workspace VPC must be /25 or smaller.

Important

If you have configured secondary CIDR blocks for your VPC, make sure that the subnets for the Databricks workspace are configured with the same VPC CIDR block.

DNS

The VPC must have DNS hostnames and DNS resolution enabled.

Subnets

Databricks must have access to at least two subnets for each workspace, with each subnet in a different availability zone. You cannot specify more than one Databricks workspace subnet per Availability Zone in the Create network configuration API call. You can have more than one subnet per availability zone as part of your network setup, but you can choose only one subnet per Availability Zone for the Databricks workspace.

Databricks assigns two IP addresses per node, one for management traffic and one for Spark applications. The total number of instances for each subnet is equal to half of the number of IP addresses that are available.

Each subnet must have a netmask between /17 and /26.

Important

The subnets that you specify for a customer-managed VPC must be reserved for one Databricks workspace only. You cannot share these subnets with any other resources, including other Databricks workspaces.

Additional subnet requirements

  • Subnets must be private.

  • Subnets must have outbound access to the public network using a NAT gateway and internet gateway, or other similar customer-managed appliance infrastructure.

  • The NAT gateway must be set up in its own subnet that routes quad-zero (0.0.0.0/0) traffic to an internet gateway or other customer-managed appliance infrastructure.

Important

A workspace using secure cluster connectivity (the default after September 1, 2020) must have outbound access from the VPC to the public network.

Subnet route table

The route table for workspace subnets must have quad-zero (0.0.0.0/0) traffic that targets the appropriate network device. If the workspace uses secure cluster connectivity (which is the default for new workspaces after September 1, 2020), quad-zero traffic must target a NAT Gateway or your own managed NAT device or proxy appliance.

Important

Databricks requires subnets to add 0.0.0.0/0 to your allow list. To control egress traffic, use an egress firewall or proxy appliance to block most traffic but allow the URLs that Databricks needs to connect to. See Configure a firewall and outbound access (Optional).

This is a base guideline only. Your configuration requirements may differ. For questions, contact your Databricks representative.

Security groups

Databricks must have access to at least one AWS security group and no more than five security groups. You can reuse existing security groups rather than create new ones.

Security groups must have the following rules:

Egress (outbound):

  • Allow all TCP and UDP access to the workspace security group (for internal traffic)

  • Allow TCP access to 0.0.0.0/0 for these ports:

    • 443: for Databricks infrastructure, cloud data sources, and library repositories

    • 3306: for the metastore

    • 6666: only required if you use PrivateLink

Ingress (inbound): Required for all workspaces (these can be separate rules or combined into one):

  • Allow TCP on all ports when traffic source uses the same security group

  • Allow UDP on all ports when traffic source uses the same security group

Subnet-level network ACLs

Subnet-level network ACLs must not deny ingress or egress to any traffic. Databricks validates for the following rules while creating the workspace:

  • ALLOW ALL from Source 0.0.0.0/0

  • Egress:

    • Allow all traffic to the workspace VPC CIDR, for internal traffic

    • Allow TCP access to 0.0.0.0/0 for these ports:

      • 443: for Databricks infrastructure, cloud data sources, and library repositories

      • 3306: for the metastore

      • 6666: only required if you use PrivateLink

Important

If you configure additional ALLOW or DENY rules for outbound traffic, set the rules required by Databricks to the highest priority (the lowest rule numbers), so that they take precedence.

Note

Databricks requires subnet-level network ACLs to add 0.0.0.0/0 to your allow list. To control egress traffic, use an egress firewall or proxy appliance to block most traffic but allow the URLs that Databricks needs to connect to. See Configure a firewall and outbound access (Optional).

Create a VPC

To create VPCs you can use various tools:

To use AWS Console, the basic instructions for creating and configuring a VPC and related objects are listed below. For complete instructions, see the AWS documentation.

Note

These basic instructions might not apply to all organizations. Your configuration requirements may differ. This section does not cover all possible ways to configure NATs, firewalls, or other network infrastructure. If you have questions, contact your Databricks representative before proceeding.

  1. Go to the VPCs page in AWS.

  2. See the region picker in the upper-right. If needed, switch to the region for your workspace.

  3. In the upper-right corner, click the orange button Create VPC.

    create new VPC editor
  4. Click VPC and more.

  5. In the Name tag auto-generation type a name for your workspace. Databricks recommends including the region in the name.

  6. For VPC address range, optionally change it if desired.

  7. For public subnets, click 2. Those subnets aren’t used directly by your Databricks workspace, but they are required to enable NATs in this editor.

  8. For private subnets, click 2 for the minimum for workspace subnets. You can add more if desired.

    Your Databricks workspace needs at least two private subnets. To resize them, for example to share one VPC with multiple workspaces that all need separate subnets, click Customize subnet CIDR blocks.

  9. For NAT gateways, click In 1 AZ.

  10. Ensure the following fields at the bottom are enabled: Enable DNS hostnames and Enable DNS resolution.

  11. Click Create VPC.

  12. When viewing your new VPC, click on the left navigation items to update related settings on the VPC. To make it easier to find related objects, in the Filter by VPC field, select your new VPC.

  13. Click Subnets and what AWS calls the private subnets labelled 1 and 2, which are the ones you will use to configure your main workspace subnets. Modify the subnets as specified in VPC requirements.

    If you created an extra private subnet for use with PrivateLink, configure private subnet 3 as specified in Enable AWS PrivateLink.

  14. Click Security groups and modify the security group as specified in Security groups.

    If you will use back-end PrivateLink connectivity, create an additional security group with inbound and outbound rules as specified in the PrivateLink article in the section Step 1: Configure AWS network objects.

  15. Click Network ACLs and modify the network ACLs as specified in Subnet-level network ACLs.

  16. Choose whether to perform the optional configurations that are specified later in this article.

  17. Register your VPC with Databricks to create a network configuration using the account console or by using the Account API.

Configure a firewall and outbound access (Optional)

If you are using secure cluster connectivity (the default as of September 1, 2020), use an egress firewall or proxy appliance to block most traffic but allow the URLs that Databricks needs to connect to:

  • If the firewall or proxy appliance is in the same VPC as the Databricks workspace VPC, route the traffic and configure it to allow the following connections.

  • If the firewall or proxy appliance is in a different VPC or an on-premise network, route 0.0.0.0/0 to that VPC or network first and configure the proxy appliance to allow the following connections.

Important

Databricks strongly recommends that you specify destinations as domain names in your egress infrastructure, rather than as IP addresses.

Allow the following outgoing connections:

  • Databricks web application: Required. Also used for REST API calls to your workspace.

  • Databricks secure cluster connectivity (SCC) relay: Required if your workspace uses secure cluster connectivity, which is the default for workspaces in accounts on the E2 version of the platform as of September 1, 2020.

  • AWS S3 global URL: Required by Databricks to access the root S3 bucket.

  • AWS S3 regional URL: Optional. However, you likely use other S3 buckets, in which case you must also allow the S3 regional endpoint. Databricks recommends creating an S3 VPC endpoint instead so that this traffic goes through the private tunnel over the AWS network backbone.

  • AWS STS global URL: Required.

  • AWS STS regional URL: Required due to expected switch to regional endpoint.

  • AWS Kinesis regional URL: Kinesis endpoint is used to capture logs needed to manage and monitor the software. For most regions, use the regional URL. However, for VPCs in us-west-1, the VPC endpoint will not come into effect today and you must ensure that the Kinesis URL is allowed for us-west-2 (not us-west-1). Databricks recommends that you create a Kinesis VPC endpoint instead so that this traffic goes through the private tunnel over the AWS network backbone.

  • Table metastore RDS regional URL (by data plane region): Required if your Databricks workspace uses the default Hive metastore, which is always in the same region as your data plane region. This means that it might be in the same geography but different region as the control plane. Instead of using the default Hive metastore, you can choose to implement your own table metastore instance, in which case you are responsible for its network routing.

Required data plane addresses

Allow connections from the addresses below, for your regions:

Endpoint

VPC region

Address

Port

Webapp

ap-northeast-1

tokyo.cloud.databricks.com

443

ap-northeast-2

seoul.cloud.databricks.com

443

ap-south-1

mumbai.cloud.databricks.com

443

ap-southeast-1

singapore.cloud.databricks.com

443

ap-southeast-2

sydney.cloud.databricks.com

443

ca-central-1

canada.cloud.databricks.com

443

eu-central-1

frankfurt.cloud.databricks.com

443

eu-west-1

ireland.cloud.databricks.com

443

eu-west-2

london.cloud.databricks.com

443

us-east-1

nvirginia.cloud.databricks.com

443

us-east-2

ohio.cloud.databricks.com

443

us-west-1

oregon.cloud.databricks.com

443

us-west-2

oregon.cloud.databricks.com

443

SCC relay

ap-northeast-1

tunnel.ap-northeast-1.cloud.databricks.com

443

ap-northeast-2

tunnel.ap-northeast-2.cloud.databricks.com

443

ap-south-1

tunnel.ap-south-1.cloud.databricks.com

443

ap-southeast-1

tunnel.ap-southeast-1.cloud.databricks.com

443

ap-southeast-2

tunnel.ap-southeast-2.cloud.databricks.com

443

ca-central-1

tunnel.ca-central-1.cloud.databricks.com

443

eu-central-1

tunnel.eu-central-1.cloud.databricks.com

443

eu-west-1

tunnel.eu-west-1.cloud.databricks.com

443

eu-west-2

tunnel.eu-west-2.cloud.databricks.com

443

us-east-1

tunnel.us-east-1.cloud.databricks.com

443

us-east-2

tunnel.us-east-2.cloud.databricks.com

443

us-west-1

tunnel.cloud.databricks.com

443

us-west-2

tunnel.cloud.databricks.com

443

S3 global for root bucket

all

s3.amazonaws.com

443

S3 regional for other buckets: Databricks recommends a VPC endpoint instead

all

s3.<region-name>.amazonaws.com

443

STS global

all

sts.amazonaws.com

443

Kinesis: Databricks recommends a VPC endpoint instead

Most regions

kinesis.<region-name>.amazonaws.com

443

us-west-1

kinesis.us-west-2.amazonaws.com

443

RDS (if using built-in metastore)

ap-northeast-1

mddx5a4bpbpm05.cfrfsun7mryq.ap-northeast-1.rds.amazonaws.com

3306

ap-northeast-2

md1915a81ruxky5.cfomhrbro6gt.ap-northeast-2.rds.amazonaws.com

3306

ap-south-1

mdjanpojt83v6j.c5jml0fhgver.ap-south-1.rds.amazonaws.com

3306

ap-southeast-1

md1n4trqmokgnhr.csnrqwqko4ho.ap-southeast-1.rds.amazonaws.com

3306

ap-southeast-2

mdnrak3rme5y1c.c5f38tyb1fdu.ap-southeast-2.rds.amazonaws.com

3306

ca-central-1

md1w81rjeh9i4n5.co1tih5pqdrl.ca-central-1.rds.amazonaws.com

3306

eu-central-1

mdv2llxgl8lou0.ceptxxgorjrc.eu-central-1.rds.amazonaws.com

3306

eu-west-1

md15cf9e1wmjgny.cxg30ia2wqgj.eu-west-1.rds.amazonaws.com

3306

eu-west-2

mdio2468d9025m.c6fvhwk6cqca.eu-west-2.rds.amazonaws.com

3306

us-east-1

mdb7sywh50xhpr.chkweekm4xjq.us-east-1.rds.amazonaws.com

3306

us-east-2

md7wf1g369xf22.cluz8hwxjhb6.us-east-2.rds.amazonaws.com

3306

us-west-1

mdzsbtnvk0rnce.c13weuwubexq.us-west-1.rds.amazonaws.com

3306

us-west-2

mdpartyyphlhsp.caj77bnxuhme.us-west-2.rds.amazonaws.com

3306

Databricks control plane infrastructure

ap-northeast-1

35.72.28.0/28

443

ap-northeast-2

3.38.156.176/28

443

ap-south-1

65.0.37.64/28

443

ap-southeast-1

13.214.1.96/28

443

ap-southeast-2

3.26.4.0/28

443

ca-central-1

3.96.84.208/28

443

eu-west-1

3.250.244.112/28

443

eu-west-2

18.134.65.240/28

443

eu-central-1

18.159.44.32/28

443

us-east-1

3.237.73.224/28

443

us-east-2

3.128.237.208/28

443

us-west-1 and us-west-2

44.234.192.32/28

443

Configure regional endpoints (Optional)

If you use a customer-managed VPC (optional) and secure cluster connectivity (the default as of September 1, 2020), you may prefer to configure your VPC to use only regional VPC endpoints to AWS services for more direct connections and reduced cost compared to AWS global endpoints. There are four AWS services that a Databricks workspace with a customer-managed VPC must reach: STS, S3, Kinesis, and RDS.

The connection from your VPC to the RDS service is required only if you use the default Databricks metastore. Although there is no VPC endpoint for RDS, instead of using the default Databricks metastore, you can configure your own external metastore. Implement an external metastore with Hive metastore or AWS Glue.

For the other three services, you can create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network:

  • S3: Create a VPC gateway endpoint that is directly accessible from your Databricks cluster subnets. This causes workspace traffic to all in-region S3 buckets to use the endpoint route. To access any cross-region buckets, open up access to S3 global URL s3.amazonaws.com in your egress appliance, or route 0.0.0.0/0 to an AWS internet gateway.

    To use DBFS FUSE with regional endpoints enabled:

    • For Databricks Runtime 6.x and above, you must set up an environment variable in the cluster configuration to set AWS_REGION=<aws-region-code>. For example, if your workspace is deployed in the N. Virginia region, set AWS_REGION=us-east-1. To enforce it for all clusters, use cluster policies.

    • For Databricks Runtime 5.5 LTS, set both AWS_REGION=<aws-region-code> and DBFS_FUSE_VERSION=2.

  • STS: Create a VPC interface endpoint directly accessible from your Databricks cluster subnets. You can create this endpoint in your workspace subnets. Databricks recommends that you use the same security group that was created for your workspace VPC. This configuration causes workspace traffic to STS to use the endpoint route.

  • Kinesis: Create a VPC interface endpoint directly accessible from your Databricks cluster subnets. You can create this endpoint in your workspace subnets. Databricks recommends that you use the same security group that was created for your workspace VPC. This configuration causes workspace traffic to Kinesis to use the endpoint route. The only exception to this rule is that for workspaces in the AWS region us-west-1 this is not true because target Kinesis streams in this case are cross-region to region us-west-2.

Troubleshoot regional endpoints

If the VPC endpoints do not work as intended, for example if your data sources are inaccessible or if the traffic is bypassing the endpoints, use one of the following approaches:

  1. Add the environment variable AWS_REGION in the cluster configuration and set it to your AWS region. To enable it for all clusters, use cluster policies. You may have already configured this environment variable to use DBFS FUSE.

  2. Add the required Apache Spark configuration:

    • Either in each source notebook:

    %scala
    sc.hadoopConfiguration.set("fs.s3a.endpoint", "https://s3.<region>.amazonaws.com")
    sc.hadoopConfiguration.set("fs.s3a.stsAssumeRole.stsEndpoint", "https://sts.<region>.amazonaws.com")
    
    %python
    sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "https://s3.<region>.amazonaws.com")
    sc._jsc.hadoopConfiguration().set("fs.s3a.stsAssumeRole.stsEndpoint", "https://sts.<region>.amazonaws.com")
    
    • Or in the Apache Spark config for the cluster:

    spark.hadoop.fs.s3a.endpoint https://s3.<region>.amazonaws.com
    spark.hadoop.fs.s3a.stsAssumeRole.stsEndpoint https://sts.<region>.amazonaws.com
    

To set these values for all clusters, configure the values as part of your cluster policy.

Access S3 using instance profiles (Optional)

To access S3 mounts using instance profiles, set the following Spark configurations:

  • Either in each source notebook:

    %scala
    sc.hadoopConfiguration.set("fs.s3a.endpoint", "https://s3.<region>.amazonaws.com")
    sc.hadoopConfiguration.set("fs.s3a.stsAssumeRole.stsEndpoint", "https://sts.<region>.amazonaws.com")
    
    %python
    sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "https://s3.<region>.amazonaws.com")
    sc._jsc.hadoopConfiguration().set("fs.s3a.stsAssumeRole.stsEndpoint", "https://sts.<region>.amazonaws.com")
    
  • Or in the Apache Spark config for the cluster:

    spark.hadoop.fs.s3a.endpoint https://s3.<region>.amazonaws.com
    spark.hadoop.fs.s3a.stsAssumeRole.stsEndpoint https://sts.<region>.amazonaws.com
    

To set these values for all clusters, configure the values as part of your cluster policy.

Warning

For the S3 service, there are limitations to applying additional regional endpoint configurations at the notebook or cluster level. Notably, access to cross-region S3 access is blocked, even if the global S3 URL is allowed in your egress firewall or proxy. If your Databricks deployment might require cross-region S3 access, it is important that you not apply the Spark configuration at the notebook or cluster level.

Restrict access to S3 buckets (Optional)

Most reads from and writes to S3 are self-contained within the data plane. However, some management operations originate from the control plane, which is managed by Databricks. To limit access to S3 buckets to a specified set of source IP addresses, create an S3 bucket policy. In the bucket policy, include the IP addresses in the aws:SourceIp list. If you use a VPC Endpoint, allow access to it by adding it to the policy’s aws:sourceVpce.

For more information about S3 bucket policies, see Limiting access to specific IP addresses in the Amazon S3 documentation. Working example bucket policies are also included in this topic.

Requirements for bucket policies

Your bucket policy must meet these requirements, to ensure that your clusters start correctly and that you can connect to them:

Required IPs and storage buckets

This table includes information you need when using S3 bucket policies and VPC Endpoint policies to restrict access to your workspace’s S3 buckets.

Region

Control plane NAT IP

Artifact storage bucket

Log storage bucket

Shared datasets bucket

ap-northeast-1

18.177.16.95/32

databricks-prod-artifacts-ap-northeast-1

databricks-prod-storage-tokyo

databricks-datasets-tokyo

ap-northeast-2

54.180.50.119/32

databricks-prod-artifacts-ap-northeast-2

databricks-prod-storage-seoul

databricks-datasets-seoul

ap-south-1

13.232.248.161/32

databricks-prod-artifacts-ap-south-1

databricks-prod-storage-mumbai

databricks-datasets-mumbai

ap-southeast-1

13.213.212.4/32

databricks-prod-artifacts-ap-southeast-1

databricks-prod-storage-singapore

databricks-datasets-singapore

ap-southeast-2

13.237.96.217/32

databricks-prod-artifacts-ap-southeast-2

databricks-prod-storage-sydney

databricks-datasets-sydney

ca-central-1

35.183.59.105/32

databricks-prod-artifacts-ca-central-1

databricks-prod-storage-montreal

databricks-datasets-montreal

eu-central-1

18.159.32.64/32

databricks-prod-artifacts-eu-central-1

databricks-prod-storage-frankfurt

databricks-datasets-frankfurt

eu-west-1

46.137.47.49/32

databricks-prod-artifacts-eu-west-1

databricks-prod-storage-ireland

databricks-datasets-ireland

eu-west-2

3.10.112.150/32

databricks-prod-artifacts-eu-west-2

databricks-prod-storage-london

databricks-datasets-london

us-east-1

54.156.226.103/32

databricks-prod-artifacts-us-east-1

databricks-prod-storage-virginia

databricks-datasets-virginia

us-east-2

18.221.200.169/32

databricks-prod-artifacts-us-east-2

databricks-prod-storage-ohio

databricks-datasets-ohio

us-west-1

52.27.216.188/32

databricks-prod-artifacts-us-west-2

databricks-prod-storage-oregon

databricks-datasets-oregon

us-west-2

52.27.216.188/32

databricks-prod-artifacts-us-west-2

databricks-prod-storage-oregon

databricks-datasets-oregon

Example bucket policies

These examples use placeholder text to indicate where to specify recommended IP addresses and required storage buckets. Review the requirements to ensure that your clusters start correctly and that you can connect to them.

Restrict access to the Databricks control plane, data plane, and trusted IPs:

This S3 bucket policy uses a Deny condition to selectively allow access from the control plane, NAT gateway, and corporate VPN IP addresses you specify. Replace the placeholder text with values for your environment. You can add any number of IP addresses to the policy. Create one policy per S3 bucket you want to protect.

Important

If you use VPC Endpoints, this policy is not complete. See Restrict access to the <Databricks> control plane, VPC endpoints, and trusted IPs.

{
  "Sid": "IPAllow",
  "Effect": "Deny",
  "Principal": "*",
  "Action": "s3:*",
  "Resource": [
    "arn:aws:s3:::<S3-BUCKET>",
    "arn:aws:s3:::<S3-BUCKET>/*"
  ],
  "Condition": {
    "NotIpAddress": {
      "aws:SourceIp": [
        "<CONTROL-PLANE-NAT-IP>",
        "<DATA-PLANE-NAT-IP>",
        "<CORPORATE-VPN-IP>"
      ]
    }
  }
}

Restrict access to the Databricks control plane, VPC endpoints, and trusted IPs:

If you use a VPC Endpoint to access S3, you must add a second condition to the policy. This condition allows access from your VPC Endpoint by adding it to the aws:sourceVpce list.

This bucket selectively allows access from your VPC Endpoint, and from the control plane and corporate VPN IP addresses you specify.

When using VPC Endpoints, you can use a VPC Endpoint policy instead of an S3 bucket policy. A VPCE policy must allow access to your root S3 bucket and also to the required artifact, log, and shared datasets bucket for your region. You can learn about VPC Endpoint policies in the AWS documentation.

Replace the placeholder text with values for your environment.

{
  "Sid": "IPAllow",
  "Effect": "Deny",
  "Principal": "*",
  "Action": "s3:*",
  "Resource": [
    "arn:aws:s3:::<S3-BUCKET>",
    "arn:aws:s3:::<S3-BUCKET>/*"
  ],
  "Condition": {
    "NotIpAddressIfExists": {
      "aws:SourceIp": [
        "<CONTROL-PLANE-NAT-IP>",
        "<CORPORATE-VPN-IP>"
      ]
    },
    "StringNotEqualsIfExists": {
      "aws:sourceVpce": "<VPCE-ID>"
    }
  }
}