Create an external location for data in DBFS root
This article shows how to configure an external location in Unity Catalog to govern access to your DBFS root storage location. Although Databricks recommends against storing data in DBFS root storage, your workspace might do so because of legacy practices.
External locations are Unity Catalog securable objects that associate storage credentials with cloud object storage containers. External locations are used to define managed storage locations for managed tables and volumes, and to govern access to the storage locations that contain external tables and external volumes.
You must create an external location if your workspace-local, legacy Databricks Hive metastore stores data in the DBFS root and you want to federate your legacy Hive metastore so that your team can work with your Hive metastore tables using Unity Catalog. See Hive metastore federation: enable Unity Catalog to govern tables registered in a Hive metastore and Enable Hive metastore federation for a legacy workspace Hive metastore.
Before you begin
In order to create an external location for the DBFS root, you must have access to a Unity Catalog storage credential that grants access to the S3 bucket that contains the DBFS root. If you do not, you can provide the ARN of an IAM role that grants access to that bucket when you are creating the external location. That process is described in the instructions that follow. For information about creating an IAM role that grants access to a cloud storage location from Databricks, see Step 1: Create an IAM role.
The S3 bucket that contains DBFS root must have object ownership must be set to Bucket Owner Enforced. If object ownership is set to Object Writer, the Unity Catalog storage credential cannot read data in the S3 bucket.
Note
If you use customer-managed keys to encrypt the S3 bucket that contains the DBFS root, you can use the same key to enable encryption for the new external location. See the instructions below and Configure encryption for S3 with KMS.
Permissions requirements:
You must have the
CREATE STORAGE CREDENTIAL
andCREATE EXTERNAL LOCATION
privileges on the metastore. Metastore admins have these privileges by default.Note
If a storage credential for the DBFS root’s storage location already exists, then the user who creates the external location does not need
CREATE STORAGE CREDENTIAL
, but does needCREATE EXTERNAL LOCATION
on both the storage credential and the metastore.You must be a workspace admin to have the system create the storage credential for you during external location creation.
You do not have to be a workspace admin if a storage credential that gives access to the DBFS root storage location already exists and you have
CREATE EXTERNAL LOCATION
on both the storage credential and the metastore.
Create the external location
You can use Catalog Explorer to create an external location for the DBFS root.
In the sidebar, click Catalog.
Click External data > click Create external location.
Click Manual, then Next.
You cannot use the AWS Quickstart option to create an external location for DBFS root.
Enter an External location name.
Under URL, click Copy from DBFS mount and select Copy from DBFS root.
The URL and subpath fields are populated with the cloud storage path to the DBFS root.
Important
When you create an external location for the DBFS root, you must use the subpath to the DBFS root location, not the path to the entire bucket. The URL and subpath are pre-populated with
user/hive/warehouse
, which is the default storage location for Hive metastore tables. If you want more fine-grained access control to the data in DBFS root, you can create separate external locations for sub-paths within DBFS root.Select a storage credential that grants access to the DBFS root cloud storage location or, if none has been defined, click + Create new storage credential.
To create the storage credential, select a Credential Type of AWS IAM Role and enter the ARN of an IAM role that grants access to the workspace-level DBFS root prefix on the S3 bucket. For information about creating an IAM role that grants access to a cloud storage location from Databricks, see Step 1: Create an IAM role.
Warning
The AWS IAM role policy must restrict access specifically to the DBFS root subpath (for example,
s3://<bucket>/<shard-name>/<workspace-id>/
). If you don’t restrict access to the DBFS root location using the subpath, you expose the workspace internal storage location and may compromise workspace-level access controls or possibly another workspace’s data if the bucket is shared.(Optional) Add a comment.
(Optional) Click Advanced options and enable Fallback mode.
Fallback mode is intended for legacy workload migration scenarios. See Enable fallback mode on external locations.
(Optional) If the S3 bucket requires SSE encryption, you can configure an encryption algorithm to allow external tables and volumes in Unity Catalog to access data in your S3 bucket.
If your workspace storage bucket, which includes DBFS root, is encrypted using a Databricks encryption keys configuration, you can use the same key to enable encryption for the new external location. For instructions, see Configure an encryption algorithm on an external location.
Click Create.
Go to the Permissions tab to grant permission to use the external location.
Click Grant.
Select users, groups, or service principals in Principals field, and select the privilege you want to grant.
Click Grant.
(Optional) Set the workspaces that can access this external location.
By default, users on any workspace that uses this Unity Catalog metastore can be granted access to the data in this location. You can limit that access to specific workspaces. Databricks recommends limiting access to the workspace that the DBFS root is in.