Create a storage credential for connecting to Cloudflare R2

Preview

This feature is in Public Preview.

This article describes how to create a storage credential in Unity Catalog to connect to Cloudflare R2. Cloudflare R2 object storage incurs no egress fees. Replicating or migrating data that you share to R2 enables you to share data across clouds and regions without incurring egress fees.

Note

Unity Catalog supports two cloud storage options for Databricks on AWS: AWS S3 buckets and Cloudflare R2 buckets. Cloudflare R2 is intended primarily for Delta Sharing use cases in which you want to avoid cloud provider data egress fees. S3 is appropriate for most other use cases. See Monitor and manage Delta Sharing egress costs (for providers) and Create a storage credential for connecting to AWS S3.

To use an R2 bucket as a storage location for data that is managed by Unity Catalog, you must create a storage credential that authorizes access to the R2 bucket and create an external location that references the storage credential and the bucket path:

  • Storage credentials encapsulate a long-term cloud credential that provides access to cloud storage.

  • External locations contain a reference to a storage credential and a cloud storage path.

This article focuses on creating a storage credential.

For more information, see Connect to cloud object storage using Unity Catalog.

Requirements

  • Databricks workspace enabled for Unity Catalog.

  • Databricks Runtime 14.3 or above, or SQL warehouse 2024.15 or above.

    If you encounter the error message No FileSystem for scheme "r2”, your compute is probably on an unsupported version.

  • Cloudflare account. See https://dash.cloudflare.com/sign-up.

  • Cloudflare R2 Admin role. See the Cloudflare roles documentation.

  • CREATE STORAGE CREDENTIAL privilege on the Unity Catalog metastore attached to the workspace. Account admins and metastore admins have this privilege by default.

Configure an R2 bucket

  1. Create a Cloudflare R2 bucket.

    You can use the Cloudflare dashboard or the Cloudflare Wrangler tool.

    See the Cloudflare R2 “Get started” documentation or the Wrangler documentation.

  2. Create an R2 API Token and apply it to the bucket.

    See the Cloudflare R2 API authentication documentation.

    Set the following token properties:

    • Permissions: Object Read & Write.

      This permission grants read and write access, which is required when you use R2 storage as a replication target, as described in Use Cloudflare R2 replicas or migrate storage to R2.

      If you want to enforce read-only access from Databricks to the R2 bucket, you can instead create a token that grants read access only. However, this may be unnecessary, because you can mark the storage credential as read-only, and any write access granted by this permission will be ignored.

    • (Optional) TTL: The length of time that you want to share the bucket data with the data recipients.

    • (Optional) Client IP Address Filtering: Select if you want to limit network access to specified recipient IP addresses. If this option is enabled, you must specify your recipients’ IP addresses and you must allowlist the Databricks control plane NAT IP address for the workspace region.

    See Outbound from Databricks control plane.

  3. Copy the R2 API token values:

    • Access Key ID

    • Secret Access Key

    Important

    Token values are shown only once.

  4. On the R2 homepage, go to Account details and copy the R2 account ID.

Create the storage credential

  1. In Databricks, log in to your workspace.

  2. Click Catalog icon Catalog.

  3. Click the +Add button and select Add a storage credential from the menu.

    This option does not appear if you don’t have the CREATE STORAGE CREDENTIAL privilege.

  4. Select a Credential Type of Cloudflare API Token.

  5. Enter a name for the credential and the following values that you copied when you configured the R2 bucket:

    • Account ID

    • Access key ID

    • Secret access key

  6. (Optional) If you want users to have read-only access to the external locations that use this storage credential, in Advanced options select Read only.

    Do not select this option if you want to use the storage credential to access R2 storage that you are using as a replication target, as described in Use Cloudflare R2 replicas or migrate storage to R2.

    For more information, see Mark a storage credential as read-only.

  7. Click Create.

  8. In the Storage credential created dialog, copy the External ID.

  9. (Optional) Bind the storage credential to specific workspaces.

    By default, a storage credential can be used by any privileged user on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See (Optional) Assign a storage credential to specific workspaces.

Next step: create the external location

See Create an external location to connect cloud storage to Databricks.