Skip to main content

Share data behind a firewall with SecureConnect

Preview

This feature is in Public Preview.

This page describes how providers set up OpenSharing SecureConnect to share data from cloud storage that is behind a firewall or private endpoint, without needing to allowlist each recipient's network.

How SecureConnect works

Before enabling SecureConnect on a Databricks account, a provider makes a one-time configuration. This configuration allows Databricks recipients to access the provider's storage behind a firewall or private endpoint. Databricks then routes recipient requests through a managed proxy, so the provider does not need to update their storage firewall when adding a new recipient.

Recipients access shared data using their existing OpenSharing setup:

  • Databricks recipients on serverless compute access shares with no per-provider firewall changes.
  • Databricks recipients on classic compute and open recipients allowlist a single set of Databricks control plane IPs for the provider's region.

Without SecureConnect, a provider must add each recipient's network identifier to their storage firewall, coordinating with the recipient and a cloud platform administrator for every new recipient.

Requirements

Set up SecureConnect as a provider

Setting up SecureConnect involves configuring your storage firewall to allow access and enabling SecureConnect for your metastores and recipients.

Step 1: Configure your storage firewall

For the lowest networking costs, keep the region of your shared assets the same as your provider metastore region.

The following instructions assume that your shared assets and provider metastore are in the same region.

SecureConnect accesses your storage through the serverless data plane. Configure your S3 bucket policies to include the VPCE OrgPath. See S3 bucket access using VPCE OrgPath.

(Optional) Configure private connectivity with a network connectivity configuration (NCC)

If your shared storage is behind a private endpoint and is not reachable from the public network, an account admin must configure a network connectivity configuration (NCC) and attach it to the metastore that hosts your shared data. For more about NCCs, see What is a network connectivity configuration (NCC)?.

An NCC attached to a workspace can't be attached to a metastore. An NCC applied to a metastore for OpenSharing applies to all shares attached to the metastore.

warning

AWS PrivateLink to S3 is not compatible with FIPS endpoints, which Databricks uses by default in all US regions. If your provider metastore is in a US region, contact your Databricks account team.

For more information, see the AWS documentation.

Create an NCC and a private endpoint rule for your S3 bucket but do not attach the NCC to a workspace. See Configure private connectivity to AWS-managed resources for NCC and PrivateLink setup.

Attach the NCC to your OpenSharing metastore:

  1. As a Databricks account administrator, go to the account console.
  2. In the sidebar, click Data icon. Catalog.
  3. Click the name of the OpenSharing metastore to open its details.
  4. Under OpenSharing Network connectivity configuration (NCC) click Edit.
  5. Search for and select the NCC you created for OpenSharing.
  6. Click Save.
important

If you are unable to attach an NCC to a metastore, contact your Databricks account team to enable private connectivity for OpenSharing SecureConnect using an NCC.

Step 2: Enable SecureConnect on a metastore

An account administrator or metastore administrator can configure the metastore so new recipients automatically use SecureConnect. By default, new and existing recipients are not enrolled in SecureConnect. You must configure existing recipients separately. See Step 3: Enable SecureConnect for individual recipients.

An account administrator or metastore administrator can configure SecureConnect:

  1. Log in to the account console.
  2. In the sidebar, click Data icon. Catalog.
  3. Click the name of a metastore to open its details.
  4. Toggle the SecureConnect setting:
    • On: New recipients in the metastore have SecureConnect enabled by default at creation time. Existing recipients are not affected.
    • Off (default): New recipients are not enrolled in SecureConnect. Enable per recipient individually.

Step 3: Enable SecureConnect for individual recipients

Recipient owners, and users with the USE_RECIPIENT privilege, toggle SecureConnect on or off for each recipient. SecureConnect is disabled on a recipient by default, unless the metastore was set to enable it for all new recipients when the recipient was created.

To configure SecureConnect on a recipient:

  1. In your Databricks workspace, click Data icon. Catalog.

  2. At the top of the Catalog pane, click the Gear icon. gear icon and select OpenSharing.

    Alternatively, in the upper-right corner, click Share > OpenSharing.

  3. On the Shared by me tab, click the Recipients tab.

  4. Turn on SecureConnect for each desired recipient.

(Optional) Step 4: Restrict open recipient access with IP ACLs

For open recipients, you can restrict which client IP addresses are allowed to reach SecureConnect using IP access lists. IP ACLs apply only to open recipients.

With SecureConnect, IP ACLs apply to both OpenSharing endpoint access and storage access. Without SecureConnect, IP ACLs restrict only OpenSharing endpoint access; storage URLs remain reachable from any client IP.

For setup instructions, see Restrict OpenSharing recipient access using IP access lists (Databricks-to-Open sharing).

note

IP ACL changes for SecureConnect-enabled open recipients can take up to 10 minutes to take effect.

Supported sharing scenarios

important

Any unsupported feature falls back to direct access from the recipient compute to the storage. The provider must manually grant access to recipient IPs in their storage firewall. See What is the OpenSharing Databricks-to-Databricks protocol? or What is the OpenSharing Databricks-to-Open sharing protocol?.

SecureConnect supports sharing to AWS, Azure, and GCP.

mTLS to SecureConnect is supported for only serverless recipient clusters.

Feature support

Feature

D2O (token)

D2O (OIDC)*

D2O (Iceberg)

D2D (serverless)

D2D (classic)

Tables with history and without partitions

✓ **

✓ **

Tables without history or with partitions

Views

Foreign tables

Materialized views

Streaming tables

Volumes

Notebooks

AI models

* OIDC sharing does not currently work when the recipient is also on Databricks.

** Cloud token optimization is not available for SecureConnect.

Limitations

  • Your assets can't be backed by Cloudflare R2 storage.
  • SecureConnect is not available on AWS GovCloud.
  • AWS PrivateLink to S3 is not compatible with FIPS endpoints, which Databricks uses by default in all US regions. If you use SecureConnect with PrivateLink in a US region, contact your Databricks account team.

For recipient-side limitations, such as mTLS support and Databricks-to-Open sharing restrictions, see Limitations.

Billing

Providers are billed for data transfer through SecureConnect. See Data transfer and connectivity pricing.

Per-recipient usage is attributed through the recipient_id field in the billing system table, so providers can break down billable SecureConnect usage by recipient. See Billable usage system table reference.

The following query returns the list cost of SecureConnect data egress for each recipient over the last 7 days:

SQL
SELECT
usage_records.usage_metadata.recipient_id,
SUM(usage_records.usage_quantity * list_prices.pricing.default) AS list_cost
FROM system.billing.usage usage_records
INNER JOIN system.billing.list_prices ON
usage_records.cloud = list_prices.cloud AND
usage_records.sku_name = list_prices.sku_name AND
usage_records.usage_start_time >= list_prices.price_start_time AND
(usage_records.usage_end_time <= list_prices.price_end_time OR list_prices.price_end_time IS NULL)
WHERE
usage_records.billing_origin_product = 'NETWORKING'
AND usage_records.usage_metadata.recipient_id IS NOT NULL
AND usage_records.usage_date >= CURRENT_DATE() - INTERVAL 7 DAYS
GROUP BY
usage_records.usage_metadata.recipient_id
ORDER BY
list_cost DESC

Additional resources