Networking recommendations for Lakehouse Federation

This article provides guidance for setting up a viable network path between your Databricks clusters or SQL warehouses and the external database system that you are connecting to using Lakehouse Federation.

Bear the following important information in mind:

  • All network traffic is directly between Databricks clusters (or SQL warehouses) and the external database system. Neither Unity Catalog or the Databricks control plane are on the network path.

  • Databricks compute (that is, clusters and SQL warehouses) always deploys in the cloud, but the external database system can be on-premises or hosted on any cloud provider, as long as there’s a viable network path between your Databricks compute and the external database.

  • If you have inbound or outbound network restrictions on either Databricks compute or the external database system, refer to the following sections for general guidance to help you create a viable network path.

For more information on networking in Databricks workspaces, see Networking.

Database system and Databricks compute both accessible from internet

The connection should work without any configuration.

Database system has network access restrictions

If the external database system has inbound or outbound network access restrictions and the Databricks cluster or SQL warehouse is accessible from the internet, then perform the following configurations, depending on the type of compute:

Classic compute resources:

Configure one of the following network solutions:

  • Stable egress IP on Databricks compute.

    Set up a stable IP address alongside a load balancer, NAT gateway, internet gateway or equivalent, and connect it with the subnet where Databricks compute is deployed. This allows the compute to share a stable public IP address that can be allowlisted on the external database side.

    The external database system should allowlist the Databricks compute stable IP for both ingress and egress traffic.

    • PrivateLink (only when the external database is on the same cloud as Databricks compute)

      Configure a PrivateLink connection between the network where the database is deployed and the network where Databricks compute is deployed.

    Serverless compute resources:

    Contact your Databricks account team to learn about plans to support secure network access to external databases from serverless compute.

Databricks compute has network access restrictions

If the external database system is accessible from the Internet and the Databricks compute has inbound or outbound network access restrictions (which is only possible if you are on a customer-managed network), then perform one of the following configurations:

  • Allowlist the hostname of the external database in the firewall rules of the subnet where Databricks compute is deployed.

    If you choose to allowlist the external database IP address rather than hostname, make sure that the external database has a stable IP address.

  • PrivateLink (only when the external database is on same cloud as Databricks compute)

    Configure a PrivateLink connection between the network where the database is deployed and the network where Databricks compute is deployed.

Databricks compute has a custom DNS server

If the external database system is accessible from the Internet and the Databricks compute has a custom DNS server (which is only possible if you are on a customer-managed network), add the database system’s hostname to your custom DNS server so that it can be resolved.