Create an external location to connect cloud storage to Databricks
This article describes how to configure an external location in Unity Catalog to connect cloud storage to Databricks.
External locations associate Unity Catalog storage credentials with cloud object storage containers. External locations are used to define managed storage locations for catalogs and schemas, and to define locations for external tables and external volumes.
You can create an external location that references storage in an AWS S3 or Cloudflare R2 bucket.
You can create an external location using Catalog Explorer, the Databricks CLI, SQL commands in a notebook or Databricks SQL query, or Terraform.
For more information about the uses of external locations and the relationship between storage credentials and external locations, see Manage access to cloud storage using Unity Catalog.
Before you begin
Prerequisites:
You must create the AWS S3 or Cloudflare R2 bucket that you want to use as an external location before you create the external location object in Databricks.
The AWS CloudFormation template supports only S3 buckets.
The name of an S3 bucket that you want users to read from and write to cannot use dot notation (for example,
incorrect.bucket.name.notation
). For more bucket naming guidance, see the AWS bucket naming rules.Avoid using a path in S3 that is already defined as an external location in another Unity Catalog metastore. You can safely read data in a single external S3 location from more than one metastore, but concurrent writes to the same S3 location from multiple metastores can lead to consistency issues.
If you don’t use the AWS CloudFormation template to create the external location, you must first create a storage credential in Databricks that gives access to the cloud storage location path. See Create a storage credential for connecting to AWS S3 and Create a storage credential for connecting to Cloudflare R2.
If you use the AWS CloudFormation flow, that storage credential is created for you.
Permissions requirements:
You must have the
CREATE EXTERNAL LOCATION
privilege on both the metastore and the storage credential referenced in the external location. Metastore admins haveCREATE EXTERNAL LOCATION
on the metastore by default.If you are creating an external location for the DBFS root storage location, the system can create the storage credential for you, but you must be a workspace admin. For details, see Create an external location for data in DBFS root
If you are using the AWS CloudFormation template, you must also have the
CREATE STORAGE CREDENTIAL
privilege on the metastore. Metastore admins haveCREATE STORAGE CREDENTIAL
on the metastore by default.
Create an external location for an S3 bucket using an AWS CloudFormation template
If you create an external location using the AWS CloudFormation template, Databricks configures the external location and creates a storage credential for you. You also have the option to create the external location manually, which requires that you first create an IAM role that gives access to the S3 bucket that is referenced by the external location and a storage credential that references that IAM role. If you want to create an external location from an existing DBFS mount point, DBFS root, or volume, the manual approach is required.
To learn about storage credentials, see Create a storage credential for connecting to AWS S3.
Note
You cannot create external locations for Cloudflare R2 buckets using the AWS CloudFormation template. Instead use the manual flow in Catalog Explorer or SQL statements in a Databricks notebook or SQL query editor.
Permissions and prerequisites: see Before you begin.
To create the external location:
Log in to a workspace that is attached to the metastore.
Click Catalog to open Catalog Explorer.
On the Quick access page, click the External data > button, go to the External Locations tab, and click Create location.
On the Create a new external location dialog, select AWS Quickstart (Recommended) then click Next.
The AWS Quickstart configures the external location and creates a storage credential for you. If you choose to use the Manual option, you must manually create an IAM role that gives access to the S3 bucket and create the storage credential in Databricks yourself.
On the Create external location with Quickstart dialog, enter the path to the S3 bucket in the Bucket Name field.
Click Generate new token to generate the personal access token that you will use to authenticate between Databricks and your AWS account.
Copy the token and click Launch in Quickstart.
In the AWS CloudFormation template that launches (labeled Quick create stack), paste the token into the Databricks Account Credentials field.
Accept the terms at the bottom of the page (I acknowledge that AWS CloudFormation might create IAM resources with custom names).
Click Create stack.
It may take a few minutes for the CloudFormation template to finish creating the external location object in Databricks.
Return to your Databricks workspace and click Catalog to open Catalog Explorer.
On the Quick access page, click the External data > button to go to the External Locations tab.
Confirm that a new external location has been created.
Automatically-generated external locations use the naming syntax
db_s3_external_databricks-S3-ingest-<id>
.(Optional) Bind the external location to specific workspaces.
By default, any privileged user can use the external location on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See (Optional) Assign an external location to specific workspaces.
Grant permission to use the external location.
For anyone to use the external location, you must grant permissions:
To use the external location to add a managed storage location to metastore, catalog, or schema, grant the
CREATE MANAGED LOCATION
privilege.To create external tables or volumes, grant
CREATE EXTERNAL TABLE
orCREATE EXTERNAL VOLUME
.
To use Catalog Explorer to grant permissions:
Click the external location name to open the details pane.
On the Permissions tab, click Grant.
On the Grant on
<external location>
dialog, select users, groups, or service principals in Principals field, and select the privilege you want to grant.Click Grant.
Create an external location manually using Catalog Explorer
You can create an external location manually using Catalog Explorer.
Permissions and prerequisites: see Before you begin.
If you are creating an external location for an S3 bucket, Databricks recommends that you use the AWS CloudFormation template rather than the procedure described here. If you use the AWS CloudFormation template, you do not need to create a storage credential. It is created for you.
If you want to create an external location from an existing DBFS mount point, DBFS root, or volume, the manual approach is required.
To create the external location:
Log in to a workspace that is attached to the metastore.
In the sidebar, click Catalog.
On the Quick access page, click the External data > button, go to the External Locations tab, and click Create location.
On the Create a new external location dialog, click Manual, then Next.
To learn about the AWS Quickstart option, see Create an external location for an S3 bucket using an AWS CloudFormation template.
In the Create a new external location manually dialog, enter an External location name.
Under URL, enter or select the path to the external location. You have three options:
To copy the container path from an existing DBFS mount point, click Copy from DBFS.
To copy the subpath to the DBFS root storage location, click Copy from DBFS and select Copy from DBFS root. If you are a workspace admin, the system also creates the storage credential for you.
If you aren’t copying from an existing mount point or DBFS root, use the URL field to enter the S3 or R2 bucket path that you want to use as the external location.
For example,
S3://mybucket/<path>
orr2://mybucket@my-account-id.r2.cloudflarestorage.com/<path>
.
Select the storage credential that grants access to the external location.
Note
If your external location is for the DBFS root and you are a workspace admin, the system creates the storage credential for you, and you do not need to select one.
If you don’t have a storage credential, you can create one:
In the Storage credential drop-down list, select + Create new storage credential.
In the Credential type drop-down list, select the type of credential you want to use in the storage credential object: AWS IAM role or Cloudflare API token.
For IAM roles, provide the IAM role ARN that gives access to the storage location. For Cloudflare API tokens, enter the Cloudflare account, access key ID, and secret access key.
For more information, see Create a storage credential for connecting to AWS S3 or Create a storage credential for connecting to Cloudflare R2.
(Optional) If you want users to have read-only access to the external location, click Advanced Options and select Read only. For more information, see Mark an external location as read-only.
(Optional) If the external location is intended for legacy workload migration, click Advanced options and enable Fallback mode.
(Optional) If the S3 bucket requires SSE encryption, you can configure an encryption algorithm to allow external tables and volumes in Unity Catalog to access data in your S3 bucket.
For instructions, see Configure an encryption algorithm on an external location.
Click Create.
(Optional) Bind the external location to specific workspaces.
By default, any privileged user can use the external location on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See (Optional) Assign an external location to specific workspaces.
Go to the Permissions tab to grant permission to use the external location.
For anyone to use the external location you must grant permissions:
To use the external location to add a managed storage location to metastore, catalog, or schema, grant the
CREATE MANAGED LOCATION
privilege.To create external tables or volumes, grant
CREATE EXTERNAL TABLE
orCREATE EXTERNAL VOLUME
.
Click Grant.
On the Grant on
<external location>
dialog, select users, groups, or service principals in Principals field, and select the privilege you want to grant.Click Grant.
Create an external location using SQL
To create an external location using SQL, run the following command in a notebook or the SQL query editor. Replace the placeholder values. For required permissions and prerequisites, see Before you begin.
<location-name>
: A name for the external location. Iflocation_name
includes special characters, such as hyphens (-
), it must be surrounded by backticks (` `
). See Names.
<bucket-path>
: The path in your cloud tenant that this external location grants access to. For example,S3://mybucket
orr2://mybucket@my-account-id.r2.cloudflarestorage.com
.<storage-credential-name>
: The name of the storage credential that authorizes reading from and writing to the bucket. If the storage credential name includes special characters, such as hyphens (-
), it must be surrounded by backticks (` `
).
CREATE EXTERNAL LOCATION [IF NOT EXISTS] `<location-name>`
URL '<bucket-path>'
WITH ([STORAGE] CREDENTIAL `<storage-credential-name>`)
[COMMENT '<comment-string>'];
If you want to limit external location access to specific workspaces in your account, also known as workspace binding or external location isolation, see (Optional) Assign an external location to specific workspaces.
(Optional) Assign an external location to specific workspaces
Preview
This feature is in Public Preview.
By default, an external location is accessible from all of the workspaces in the metastore. This means that if a user has been granted a privilege (such as READ FILES
) on that external location, they can exercise that privilege from any workspace attached to the metastore. If you use workspaces to isolate user data access, you might want to allow access to an external location only from specific workspaces. This feature is known as workspace binding or external location isolation.
Typical use cases for binding an external location to specific workspaces include:
Ensuring that data engineers who have the
CREATE EXTERNAL TABLE
privilege on an external location that contains production data can create external tables on that location only in a production workspace.Ensuring that data engineers who have the
READ FILES
privilege on an external location that contains sensitive data can only use specific workspaces to access that data.
For more information about how to restrict other types of data access by workspace, see Limit catalog access to specific workspaces.
Important
Workspace bindings are referenced at the point when privileges against the external location are exercised. For example, if a user creates an external table by issuing the statement CREATE TABLE myCat.mySch.myTable LOCATION 's3://bucket/path/to/table'
from the myWorkspace
workspace, the following workspace binding checks are performed in addition to regular user privilege checks:
Is the external location covering
's3://bucket/path/to/table'
bound tomyWorkspace
?Is the catalog
myCat
bound tomyWorkspace
with access levelRead & Write
?
If the external location is subsequently unbound from myWorkspace
, then the external table continues to function.
This feature also allows you to populate a catalog from a central workspace and make it available to other workspaces using catalog bindings, without also having to make the external location available in those other workspaces.
Bind an external location to one or more workspaces
To assign an external location to specific workspaces, you can use Catalog Explorer or the Databricks CLI.
Permissions required: Metastore admin, external location owner, or MANAGE
on the external location.
Note
Metastore admins can see all external locations in a metastore using Catalog Explorer—and external location owners can see all external locations that they own in a metastore—regardless of whether the external location is assigned to the current workspace. External locations that are not assigned to the workspace appear grayed out.
Log in to a workspace that is linked to the metastore.
In the sidebar, click Catalog.
On the Quick access page, click the External data > button to go to the External Locations tab.
Select the external location and go to the Workspaces tab.
On the Workspaces tab, clear the All workspaces have access checkbox.
If your external location is already bound to one or more workspaces, this checkbox is already cleared.
Click Assign to workspaces and enter or find the workspaces you want to assign.
To revoke access, go to the Workspaces tab, select the workspace, and click Revoke. To allow access from all workspaces, select the All workspaces have access checkbox.
There are two Databricks CLI command groups and two steps required to assign an external location to a workspace.
In the following examples, replace <profile-name>
with the name of your Databricks authentication configuration profile. It should include the value of a personal access token, in addition to the workspace instance name and workspace ID of the workspace where you generated the personal access token. See Databricks personal access token authentication.
Use the
external-locations
command group’supdate
command to set the external location’sisolation mode
toISOLATED
:databricks external-locations update <my-location> \ --isolation-mode ISOLATED \ --profile <profile-name>
The default
isolation-mode
isOPEN
to all workspaces attached to the metastore.Use the
workspace-bindings
command group’supdate-bindings
command to assign the workspaces to the external location:databricks workspace-bindings update-bindings external-location <my-location> \ --json '{ "add": [{"workspace_id": <workspace-id>}...], "remove": [{"workspace_id": <workspace-id>}...] }' --profile <profile-name>
Use the
"add"
and"remove"
properties to add or remove workspace bindings.Note
Read-only binding (
BINDING_TYPE_READ_ONLY
) is not available for external locations. Therefore there is no reason to setbinding_type
for the external locations binding.
To list all workspace assignments for an external location, use the workspace-bindings
command group’s get-bindings
command:
databricks workspace-bindings get-bindings external-location <my-location> \
--profile <profile-name>
See also Workspace Bindings in the REST API reference.
Unbind an external location from a workspace
Instructions for revoking workspace access to an external location using Catalog Explorer or the workspace-bindings
CLI command group are included in Bind an external location to one or more workspaces.
Next steps
Grant other users permission to use external locations. See Manage external locations.
Define managed storage locations using external locations. See Specify a managed storage location in Unity Catalog.
Define external tables using external locations. See Work with external tables.
Define external volumes using external locations. See What are Unity Catalog volumes?.