Manage external locations
This page describes how to list, view, update, grant permissions on, enable file events for, and delete external locations.
Databricks recommends governing file access using volumes. See What are Unity Catalog volumes?.
Describe an external location
To see the properties of an external location, including permissions and workspace access, you can use Catalog Explorer or a SQL command.
- Catalog Explorer
- SQL
- In the sidebar, click
Catalog.
- On the Quick access page, click the External data > button to go to the External Locations tab.
- Click the name of an external location to view its properties.
Run the following command in a notebook or the Databricks SQL editor. Replace <location-name>
with the name of the external location.
DESCRIBE EXTERNAL LOCATION <location-name>;
Show grants on an external location
To view the grants on an external location, you can use Catalog Explorer or a SQL command.
- Catalog Explorer
- SQL
- In the sidebar, click
Catalog.
- On the Quick access page, click the External data > button to go to the External Locations tab.
- Click the name of an external location.
- Click Permissions.
To show grants on an external location, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.
SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location-name>;
Replace the placeholder values:
<location-name>
: The name of the external location that authorizes reading from and writing to the bucket in your cloud tenant.<principal>
: The email address of an account-level user or the name of an account-level group. If a group or username contains a space or@
symbol, use back-ticks (` `
) around it, not apostrophes.
Grant permissions on an external location
This section describes how to grant and revoke permissions on an external location using Catalog Explorer and SQL commands in a notebook or SQL query. For information about using the Databricks CLI or Terraform instead, see the Databricks Terraform documentation and What is the Databricks CLI?.
Permissions required: The CREATE EXTERNAL LOCATION
privilege on both the metastore and the storage credential referenced in the external location or the MANAGE
privilege on the external location. Metastore admins have CREATE EXTERNAL LOCATION
on the metastore by default.
To grant permission to use an external location:
- Catalog Explorer
- SQL
- In the sidebar, click
Catalog.
- On the Quick access page, click the External data > button to go to the External Locations tab.
- Click the name of an external location to open its properties.
- Click Permissions.
- To grant permission to users or groups, select each identity, then click Grant.
- To revoke permissions from users or groups, select each identity, then click Revoke.
Run the following SQL command in a notebook or SQL query editor. This example grants the ability to create an external table that references the external location:
GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location-name> TO <principal>;
Replace the placeholder values:
<location-name>
: The name of the external location that authorizes reading from and writing to the bucket in your cloud tenant.<principal>
: The email address of an account-level user or the name of an account-level group.
Assign an external location to specific workspaces
By default, an external location is accessible from all of the workspaces in the metastore. This means that if a user has been granted a privilege (such as READ FILES
) on that external location, they can exercise that privilege from any workspace attached to the metastore. If you use workspaces to isolate user data access, you might want to allow access to an external location only from specific workspaces. This feature is known as workspace binding or external location isolation.
Typical use cases for binding an external location to specific workspaces include:
- Ensuring that data engineers who have the
CREATE EXTERNAL TABLE
privilege on an external location that contains production data can create external tables on that location only in a production workspace. - Ensuring that data engineers who have the
READ FILES
privilege on an external location that contains sensitive data can only use specific workspaces to access that data.
For more information about how to restrict other types of data access by workspace, see Limit catalog access to specific workspaces.
Workspace bindings are referenced at the point when privileges against the external location are exercised. For example, if a user creates an external table by issuing the statement CREATE TABLE myCat.mySch.myTable LOCATION 's3://bucket/path/to/table'
from the myWorkspace
workspace, the following workspace binding checks are performed in addition to regular user privilege checks:
- Is the external location covering
's3://bucket/path/to/table'
bound tomyWorkspace
? - Is the catalog
myCat
bound tomyWorkspace
with access levelRead & Write
?
If the external location is subsequently unbound from myWorkspace
, then the external table continues to function.
This feature also allows you to populate a catalog from a central workspace and make it available to other workspaces using catalog bindings, without also having to make the external location available in those other workspaces.
Bind an external location to one or more workspaces
To assign an external location to specific workspaces, you can use Catalog Explorer or the Databricks CLI.
Permissions required: Metastore admin, external location owner, or MANAGE
on the external location.
Metastore admins can see all external locations in a metastore using Catalog Explorer—and external location owners can see all external locations that they own in a metastore—regardless of whether the external location is assigned to the current workspace. External locations that are not assigned to the workspace appear grayed out.
- Catalog Explorer
- CLI
-
Log in to a workspace that is linked to the metastore.
-
In the sidebar, click
Catalog.
-
On the Quick access page, click the External data > button to go to the External Locations tab.
-
Select the external location and go to the Workspaces tab.
-
On the Workspaces tab, clear the All workspaces have access checkbox.
If your external location is already bound to one or more workspaces, this checkbox is already cleared.
-
Click Assign to workspaces and enter or find the workspaces you want to assign.
To revoke access, go to the Workspaces tab, select the workspace, and click Revoke. To allow access from all workspaces, select the All workspaces have access checkbox.
There are two Databricks CLI command groups and two steps required to assign an external location to a workspace.
In the following examples, replace <profile-name>
with the name of your Databricks authentication configuration profile. It should include the value of a personal access token, in addition to the workspace instance name and workspace ID of the workspace where you generated the personal access token. See Personal access token authentication (deprecated).
-
Use the
external-locations
command group'supdate
command to set the external location'sisolation mode
toISOLATED
:Bashdatabricks external-locations update <my-location> \
--isolation-mode ISOLATED \
--profile <profile-name>The default
isolation-mode
isOPEN
to all workspaces attached to the metastore. -
Use the
workspace-bindings
command group'supdate-bindings
command to assign the workspaces to the external location:Bashdatabricks workspace-bindings update-bindings external-location <my-location> \
--json '{
"add": [{"workspace_id": <workspace-id>}...],
"remove": [{"workspace_id": <workspace-id>}...]
}' --profile <profile-name>Use the
"add"
and"remove"
properties to add or remove workspace bindings.noteRead-only binding (
BINDING_TYPE_READ_ONLY
) is not available for external locations. Therefore there is no reason to setbinding_type
for the external locations binding.
To list all workspace assignments for an external location, use the workspace-bindings
command group's get-bindings
command:
databricks workspace-bindings get-bindings external-location <my-location> \
--profile <profile-name>
See also Workspace Bindings in the REST API reference.
Unbind an external location from a workspace
Instructions for revoking workspace access to an external location using Catalog Explorer or the workspace-bindings
CLI command group are included in Bind an external location to one or more workspaces.
Change the owner of an external location
An external location's creator is its initial owner. To change the owner to a different account-level user or group, you can use Catalog Explorer or a SQL command.
Permissions required: External location owner or a user with the MANAGE
privilege.
- Catalog Explorer
- SQL
- In the sidebar, click
Catalog.
- On the Quick access page, click the External data > button to go to the External Locations tab.
- Click the name of an external location.
- Click
next to Owner.
- Type to search for a principal and select it.
- Click Save.
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group. If a group or username contains a space or@
symbol, use back-ticks around it (not apostrophes). For example
ALTER EXTERNAL LOCATION <location-name> OWNER TO <principal>
Mark an external location as read-only
If you want users to have read-only access to an external location, you can use Catalog Explorer to mark the external location as read-only.
Making external locations read-only:
- Prevents users from writing to files in those external locations, regardless of any write permissions granted by the IAM role that underlies the storage credential, and regardless of the Unity Catalog permissions granted on that external location.
- Prevents users from creating managed tables or volumes in those external locations.
- Blocks catalog creation using read-only external locations.
- Enables the system to validate the external location properly at creation time.
You can mark external locations as read-only when you create them.
You can also use Catalog Explorer to change read-only status after creating an external location:
- In the sidebar, click
Catalog.
- On the Quick access page, click the External data > button to go to the External Locations tab.
- Select the external location, click the
menu next to the Test connection button, and select Edit.
- On the edit dialog, click Advanced Options and select the Limit to read-only use option.
- Click Update.
Configure an encryption algorithm on an external location (AWS S3 only)
AWS supports server-side encryption (SSE) with Amazon S3 managed keys (SSE-S3) or AWS KMS keys (SSE-KMS) for protecting data in S3. If your S3 bucket requires SSE encryption, you can configure an encryption algorithm in your external location to allow external tables and volumes in Unity Catalog to access data in your S3 bucket. SSE is not supported with external tables shared using Delta Sharing. For more information, see Configure encryption for S3 with KMS.
-
In the sidebar, click
Catalog.
-
At the top of the Catalog pane, click the
gear icon and select External Locations.
-
Select the external location. The external location must use an IAM role for a storage credential.
-
Click the
kebab menu next to the Test connection button, and select Edit.
-
On the edit dialog, click Advanced Options.
-
Under Encryption Algorithm select SSE-SE or SSE-KMS depending on your encryption key.
For SSE-KMS, under Encryption KMS key arn paste the ARN of the KMS key referenced by clients when accessing the S3 location.
-
Click Update.
(Recommended) Enable file events for an external location
If you want to ingest change notifications that are pushed by the cloud provider, enabling file events for the external location has the following advantages:
- Makes it easier to set up file notifications for Auto Loader. Specifically, it enables incremental file discovery with notification-like performance in Auto Loader by setting
cloudFiles.useManagedFileEvents
totrue
. See Configure Auto Loader streams in file notification mode. - Improves the efficiency and capacity of file arrival triggers for jobs. See Trigger jobs when new files arrive.
Before you begin
If you want Databricks to configure SQS queues on your behalf, your external location must reference a storage credential that gives adequate permissions to do so. See the next step for instructions.
If you want to create your own SQS queues, the identity represented by the storage credential must have the following permissions on those SQS queues:
sqs:ReceiveMessage
sqs:DeleteMessage
sqs:PurgeQueue
Step 1: Confirm that Databricks has access to file events in S3
Before you can enable file events for the external location securable object, you must ensure that your S3 account is configured to give Databricks access to the file events that it emits. If you want Databricks to configure file events in S3 for you, you must also ensure that Databricks has the proper access.
Assigning this access is a recommended step when you configure storage credentials.
To verify that Databricks can configure and subscribe to your bucket's event notifications:
-
Get the IAM role.
-
In the sidebar, click
Catalog.
-
On the Quick access page, click the External data > button to go to the External Locations tab.
-
Select the external location.
-
On the Overview tab, click the Credential name.
-
On the storage credential Overview tab, copy the IAM role (ARN).
You will use this in the next step.
-
-
Log into your AWS account.
-
Go to IAM and search for the role that you copied in step 1.
-
Under Permissions policies, find the IAM policy or policies associated with the IAM role and open it.
-
Open the policy or policies and confirm that there is one that includes the following properties.
This policy allows your Databricks account to update your bucket's event notification configuration, create an SNS topic, create an SQS queue, and subscribe the SQS queue to the SNS topic.
Replace
<BUCKET>
with the name of your S3 bucket.JSON{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ManagedFileEventsSetupStatement",
"Effect": "Allow",
"Action": [
"s3:GetBucketNotification",
"s3:PutBucketNotification",
"sns:ListSubscriptionsByTopic",
"sns:GetTopicAttributes",
"sns:SetTopicAttributes",
"sns:CreateTopic",
"sns:TagResource",
"sns:Publish",
"sns:Subscribe",
"sqs:CreateQueue",
"sqs:DeleteMessage",
"sqs:ReceiveMessage",
"sqs:SendMessage",
"sqs:GetQueueUrl",
"sqs:GetQueueAttributes",
"sqs:SetQueueAttributes",
"sqs:TagQueue",
"sqs:ChangeMessageVisibility",
"sqs:PurgeQueue"
],
"Resource": ["arn:aws:s3:::<BUCKET>", "arn:aws:sqs:*:*:csms-*", "arn:aws:sns:*:*:csms-*"]
},
{
"Sid": "ManagedFileEventsListStatement",
"Effect": "Allow",
"Action": ["sqs:ListQueues", "sqs:ListQueueTags", "sns:ListTopics"],
"Resource": ["arn:aws:sqs:*:*:csms-*", "arn:aws:sns:*:*:csms-*"]
},
{
"Sid": "ManagedFileEventsTeardownStatement",
"Effect": "Allow",
"Action": ["sns:Unsubscribe", "sns:DeleteTopic", "sqs:DeleteQueue"],
"Resource": ["arn:aws:sqs:*:*:csms-*", "arn:aws:sns:*:*:csms-*"]
}
]
}See also Create a storage credential that accesses an AWS S3 bucket.
-
If this policy is missing, add it to the IAM role.
Step 2: Enable file events for the external location using Catalog Explorer
To enable file events:
-
In the sidebar, click
Catalog.
-
On the Quick access page, click the External data > button to go to the External Locations tab.
-
Select the external location.
-
Click the
kebab menu next to the Test connection button, and select Edit.
-
On the edit dialog, click Advanced Options.
-
Select Enable file events.
-
Select the File event type:
Automatic: (Recommended) select this if you want Databricks to set up subscriptions and events for you.
Provided: Select this if you have already configured an SQS queue yourself.
-
If you selected the Provided file event type, enter the Queue URL of the existing SQS queue:
https://sqs.<region>.amazonaws.com/<account-ID>/<queue-name>
. -
Click Update.
-
Wait a few seconds and click Test connection on the main external location edit page to confirm that file events have been enabled successfully.
File events limitations
File events on external locations have the following limitations:
-
Event throughput is limited to 2000 files ingested per second.
-
You cannot tag cloud resources using the
resourceTags
option. Instead, tag resources using the cloud console after the Auto Loader service creates your queue and subscription resources. -
You can't set up file events for the Unity Catalog metastore root storage location if no external location is defined for that storage location.
You can't set up file events on storage locations that have no external location object defined in Unity Catalog. Some implementations of Unity Catalog include a metastore root storage location that is not associated with an external location. To determine if this is the case for your metastore root storage location:
-
As an account admin, log in to the account console.
-
Click
Catalog.
-
Click the metastore name.
-
Go to the Configuration tab.
-
If there is a S3 bucket path value, then your metastore root has no external location object defined for it.
To update the metastore root storage to use an external location, click the Remove button. An external location will be created for you. For details, see Remove metastore-level storage.
-
Modify an external location
An external location's owner or a user with the MANAGE
privilege can rename, change the URI, and change the storage credential of the external location.
To rename an external location, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the location.<new-location-name>
: A new name for the location.
ALTER EXTERNAL LOCATION <location-name> RENAME TO <new-location-name>;
To change the URI that an external location points to in your cloud tenant, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
ALTER EXTERNAL LOCATION location_name SET URL '<url>' [FORCE];
The FORCE
option changes the URL even if external tables depend upon the external location.
To change the storage credential that an external location uses, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the external location.<credential-name>
: The name of the storage credential that grants access to the location's URL in your cloud tenant.
ALTER EXTERNAL LOCATION <location-name> SET STORAGE CREDENTIAL <credential-name>;
Delete an external location
To delete (drop) an external location you must be its owner or have the MANAGE
privilege on the external location. To delete an external location, do the following:
Run the following command in a notebook or the Databricks SQL editor. Items in brackets are optional. Replace <location-name>
with the name of the external location.
DROP EXTERNAL LOCATION [IF EXISTS] <location-name>;