Configure firewall settings for AWS SQL Database
The Microsoft SQL Server connector is in gated Public Preview. To participate in the preview, contact your Databricks account team.
AWS provides multiple options to control network access between services deployed within Virtual Private Clouds (VPCs). This includes configuring security group rules, VPC peering, and AWS PrivateLink. The ingestion gateway for the SQL Server connector deploys inside classic compute and runs within the context of the VPC associated with a Databricks workspace on AWS.
The following steps outline how to configure access between Databricks on AWS (with the classic compute of the ingestion gateway) and AWS-hosted SQL Server (RDS or EC2-based) using VPC peering or AWS PrivateLink.
Deploy Databricks workspace in your VPC
Ensure that your Databricks workspace is deployed using VPC deployment mode (classic compute), with custom VPC settings to allow control over network traffic. Databricks will launch compute resources (clusters) in specified subnets within your VPC, allowing you to securely connect to other AWS services like SQL Server using private IPs. For more information, see Create a workspace with custom AWS configurations.
Option 1: Set up VPC peering
If your compute resources and your SQL Server instance reside in the same VPC, you can directly connect them. Make sure that it's configured as follows:
- Both resources are in subnets that can route traffic between them.
- Security groups are configured to allow necessary traffic (for example, TCP port 1433 or 3306).
- NACLs are set to permit the required traffic flow.
If your compute resources and SQL Server instance reside in separate VPCs, you must establish a peering connection between the VPCs. This allows them to communicate as they would if they were in the same network. Make sure that:
- Each VPC has a unique IP address range.
- Route tables have been updated to direct traffic between the VPCs.
- You've updated security groups and NACLs to permit traffic from the peered VPC.
Option 2: Use AWS Transit Gateway
For more complex architectures involving multiple VPCs, AWS Transit Gateway acts as a hub to interconnect VPCs and on-premises networks. For the summary of steps, see the following steps. For the full instructions to set up connectivity to on-premises, see Get started with using Amazon VPC Transit Gateways in the AWS documentation.
- Create the transit gateway.
- Attach your AWS VPC to your transit gateway.
- Configure on-prem connectivity with a site-to-site VPN or AWS Direct Connect.
- Update the route tables. See Transit gateway route tables in Amazon VPC Transit Gateways in the AWS documentation.
- Configure your security groups. See Controlling access with security groups in the AWS documentation.
- Make sure your on-prem SQL Server allows incoming connections from the Databricks VPC IP range.
Option 3: Use AWS PrivateLink with Network Load Balancer
To access Amazon RDS across VPCs or AWS accounts without VPC peering or Transit Gateway, you can use AWS PrivateLink with Network Load Balancer.
- AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises networks without exposing traffic to the public internet.
- Network Load Balancer (NLB) directs database traffic to Amazon RDS.