Manage external locations
This article describes how to list, view, update, grant permissions on, enable file events for, and delete external locations.
Databricks recommends governing file access using volumes. See What are Unity Catalog volumes?.
Describe an external location
To see the properties of an external location, including permissions and workspace access, you can use Catalog Explorer or a SQL command.
- Catalog Explorer
- SQL
- In the sidebar, click
Catalog.
- On the Quick access page, click the External data > button to go to the External Locations tab.
- Click the name of an external location to view its properties.
Run the following command in a notebook or the Databricks SQL editor. Replace <location-name>
with the name of the external location.
DESCRIBE EXTERNAL LOCATION <location-name>;
Show grants on an external location
To show grants on an external location, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.
SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location-name>;
Replace the placeholder values:
<location-name>
: The name of the external location that authorizes reading from and writing to the GCS bucket in your cloud tenant.<principal>
: The email address of an account-level user or the name of an account-level group.
If a group or username contains a space or @
symbol, use back-ticks (` `
) around it, not apostrophes.
Grant permissions on an external location
This section describes how to grant and revoke permissions on an external location using Catalog Explorer and SQL commands in a notebook or SQL query. For information about using the Databricks CLI or Terraform instead, see the Databricks Terraform documentation and What is the Databricks CLI?.
You can grant the following permissions on an external location:
CREATE EXTERNAL TABLE
CREATE EXTERNAL VOLUME
CREATE MANAGED STORAGE
Permissions required: The CREATE EXTERNAL LOCATION
privilege on both the metastore and the storage credential referenced in the external location or the MANAGE
privilege on the external location. Metastore admins have CREATE EXTERNAL LOCATION
on the metastore by default.
To grant permission to use an external location:
- Catalog Explorer
- SQL
- In the sidebar, click
Catalog.
- On the Quick access page, click the External data > button to go to the External Locations tab.
- Click the name of an external location to open its properties.
- Click Permissions.
- To grant permission to users or groups, select each identity, then click Grant.
- To revoke permissions from users or groups, select each identity, then click Revoke.
Run the following SQL command in a notebook or SQL query editor. This example grants the ability to create an external table that references the external location:
GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location-name> TO <principal>;
Replace the placeholder values:
<location-name>
: The name of the external location that authorizes reading from and writing to the GCS bucket in your cloud tenant.<principal>
: The email address of an account-level user or the name of an account-level group.
If a group or username contains a space or @
symbol, use back-ticks around it (not apostrophes). For example
finance team
.
Change the owner of an external location
An external location's creator is its initial owner. To change the owner to a different account-level user or group, run the following command in a notebook or the Databricks SQL editor or use Catalog Explorer.
Permissions required: External location owner or a user with the MANAGE
privilege.
Replace the placeholder values:
<location-name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
ALTER EXTERNAL LOCATION <location-name> OWNER TO <principal>
Mark an external location as read-only
If you want users to have read-only access to an external location, you can use Catalog Explorer to mark the external location as read-only.
Making external locations read-only:
- Prevents users from writing to files in those external locations, regardless of any write permissions granted by the service account that underlies the storage credential, and regardless of the Unity Catalog permissions granted on that external location.
- Prevents users from creating managed tables or volumes in those external locations.
- Enables the system to validate the external location properly at creation time.
You can mark external locations as read-only when you create them.
You can also use Catalog Explorer to change read-only status after creating an external location:
- In the sidebar, click
Catalog.
- On the Quick access page, click the External data > button to go to the External Locations tab.
- Select the external location, click the
menu next to the Test connection button, and select Edit.
- On the edit dialog, click Advanced Options and select the Limit to read-only use option.
- Click Update.
(Recommended) Enable file events for an external location
This feature is in Public Preview.
If you want to ingest change notifications that are pushed by the cloud provider, enabling managed file events for the external location has the following advantages:
- Makes it easier to set up file notifications for Auto Loader. Specifically, it enables incremental file discovery with notification-like performance in Auto Loader by setting
cloudFiles.useManagedFileEvents
totrue
. See What is Auto Loader file notification mode?. - Improves the efficiency and capacity of file arrival triggers for jobs. See Trigger jobs when new files arrive.
Before you begin
If you want Databricks to configure Pub/Sub subscriptions in GCS on your behalf, your external location must reference a storage credential that gives adequate permissions to do so. See the next step for instructions.
If you want to create your own Pub/Sub subscriptions in GCS, the identity represented by the storage credential must have the following permission on those Pub/Sub subscriptions:
pubsub.subscriptions.consume
Step 1: Confirm that Databricks has access to file events in GCS
Before you can enable file events for the external location securable object, you must ensure that your GCS account is configured to give Databricks access to the file events that it emits. If you want Databricks to configure file events in GCS for you, you must also ensure that Databricks has the proper access.
Assigning this access is an optional step when you configure a service account that can access GCS buckets using Databricks and Unity Catalog. To verify that Databricks can configure and subscribe to your bucket's event notifications:
-
Get the service account email address.
-
In the sidebar, click
Catalog.
-
On the Quick access page, click the External data > button to go to the External Locations tab.
-
Select the external location.
-
On the Overview tab, click the Credential name.
-
On the storage credential Overview tab, copy the service account Email address.
You will use this in the next step.
-
-
Log into your GCP account.
-
Go to IAM & Admin > Roles.
-
Find the service account whose email address you copied in step 1.
-
Confirm that it has been assigned a custom role that includes the following configurations:
pubsub.subscriptions.consume
pubsub.subscriptions.create
pubsub.subscriptions.delete
pubsub.subscriptions.get
pubsub.subscriptions.list
pubsub.subscriptions.update
pubsub.topics.attachSubscription
pubsub.topics.create
pubsub.topics.delete
pubsub.topics.get
pubsub.topics.list
pubsub.topics.update
storage.buckets.update -
If the service account does not have a role with these configurations, create a custom role that does, and assign it to the service account. For instructions, see (Recommended) Configure permissions for file events.
Step 2: Enable file events for the external location using Catalog Explorer
To enable file events:
-
In the sidebar, click
Catalog.
-
On the Quick access page, click the External data > button to go to the External Locations tab.
-
Select the external location.
-
Click the
kebab menu next to the Test connection button, and select Edit.
-
On the edit dialog, click Advanced Options.
-
Select Enable file events.
-
Select the File event type:
Automatic: (Recommended) select this if you want Databricks to set up subscriptions and events for you.
Provided: Select this if you have already configured a Google Cloud Pub/Sub subscription yourself.
-
If you selected the Provided file event type, enter the existing Pub/Sub Subscription name:
projects/<project-id>/subscriptions/<subscription-name>
. -
Click Update.
-
Wait a few seconds and click Test connection on the main external location edit page to confirm that file events have been enabled successfully.
File events limitations
Managed file events have the following limitations:
-
Event throughput is limited to 2000 files ingested per second.
-
With managed file events, you cannot tag cloud resources using the
resourceTags
option. Instead, you can tag resources using the cloud console after the Auto Loader service creates your queue and subscription resources. -
You can't set up file notifications for the Unity Catalog metastore root storage location.
Specifically, you can't set up file notifications using file events on storage locations that have no external location object defined in Unity Catalog. Older implementations of Unity Catalog include a metastore root storage location that is not associated with an external location. In newer implementations, specifically those on workspaces that were enabled for Unity Catalog automatically, the metastore-level storage does have an external location defined for it, and managed file events are supported on that storage location. If you're as confused as I sound, see Automatic enablement of Unity Catalog.
Modify an external location
An external location's owner or a user with the MANAGE
privilege can rename, change the URI, and change the storage credential of the external location.
To rename an external location, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the location.<new-location-name>
: A new name for the location.
ALTER EXTERNAL LOCATION <location-name> RENAME TO <new-location-name>;
To change the URI that an external location points to in your cloud tenant, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
ALTER EXTERNAL LOCATION location_name SET URL '<url>' [FORCE];
The FORCE
option changes the URL even if external tables depend upon the external location.
To change the storage credential that an external location uses, do the following:
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the external location.<credential-name>
: The name of the storage credential that grants access to the location's URL in your cloud tenant.
ALTER EXTERNAL LOCATION <location-name> SET STORAGE CREDENTIAL <credential-name>;
Delete an external location
To delete (drop) an external location you must be its owner or have the MANAGE
privilege on the external location. To delete an external location, do the following:
Run the following command in a notebook or the Databricks SQL editor. Items in brackets are optional. Replace <location-name>
with the name of the external location.
DROP EXTERNAL LOCATION [IF EXISTS] <location-name>;