Create an external location for data in DBFS root

This article shows how to configure an external location in Unity Catalog to govern access to your DBFS root storage location. Although Databricks recommends against storing data in DBFS root storage, your workspace might do so because of legacy practices.

External locations are Unity Catalog securable objects that associate storage credentials with cloud object storage containers. External locations are used to define managed storage locations for managed tables and volumes, and to govern access to the storage locations that contain external tables and external volumes.

You must create an external location if your workspace-local, legacy Databricks Hive metastore stores data in the DBFS root and you want to federate your legacy Hive metastore so that your team can work with your Hive metastore tables using Unity Catalog. See Hive metastore federation: enable Unity Catalog to govern tables registered in a Hive metastore and Enable Hive metastore federation for a legacy workspace Hive metastore.

Before you begin

In order to create an external location for the DBFS root, you must have access to a Unity Catalog storage credential that grants access to the S3 bucket that contains the DBFS root. If you do not, you can provide the ARN of an IAM role that grants access to that bucket when you are creating the external location. That process is described in the instructions that follow. For information about creating an IAM role that grants access to a cloud storage location from Databricks, see Step 1: Create an IAM role.

The S3 bucket that contains DBFS root must have object ownership must be set to Bucket Owner Enforced. If object ownership is set to Object Writer, the Unity Catalog storage credential cannot read data in the S3 bucket.

Note

If you use customer-managed keys to encrypt the S3 bucket that contains the DBFS root, you can use the same key to enable encryption for the new external location. See the instructions below and Configure encryption for S3 with KMS.

Permissions requirements:

  • You must have the CREATE STORAGE CREDENTIAL and CREATE EXTERNAL LOCATION privileges on the metastore. Metastore admins have these privileges by default.

    Note

    If a storage credential for the DBFS root’s storage location already exists, then the user who creates the external location does not need CREATE STORAGE CREDENTIAL, but does need CREATE EXTERNAL LOCATION on both the storage credential and the metastore.

  • You must be a workspace admin to have the system create the storage credential for you during external location creation.

    You do not have to be a workspace admin if a storage credential that gives access to the DBFS root storage location already exists and you have CREATE EXTERNAL LOCATION on both the storage credential and the metastore.

Create the external location

You can use Catalog Explorer to create an external location for the DBFS root.

  1. In the sidebar, click Catalog icon Catalog.

  2. Click External data > click Create external location.

  3. Click Manual, then Next.

    You cannot use the AWS Quickstart option to create an external location for DBFS root.

  4. Enter an External location name.

  5. Under URL, click Copy from DBFS mount and select Copy from DBFS root.

    The URL and subpath fields are populated with the cloud storage path to the DBFS root.

    Important

    When you create an external location for the DBFS root, you must use the subpath to the DBFS root location, not the path to the entire bucket. The URL and subpath are pre-populated with user/hive/warehouse, which is the default storage location for Hive metastore tables. If you want more fine-grained access control to the data in DBFS root, you can create separate external locations for sub-paths within DBFS root.

  6. Select a storage credential that grants access to the DBFS root cloud storage location or, if none has been defined, click + Create new storage credential.

    To create the storage credential, select a Credential Type of AWS IAM Role and enter the ARN of an IAM role that grants access to the workspace-level DBFS root prefix on the S3 bucket. For information about creating an IAM role that grants access to a cloud storage location from Databricks, see Step 1: Create an IAM role.

    Warning

    The AWS IAM role policy must restrict access specifically to the DBFS root subpath (for example, s3://<bucket>/<shard-name>/<workspace-id>/). If you don’t restrict access to the DBFS root location using the subpath, you expose the workspace internal storage location and may compromise workspace-level access controls or possibly another workspace’s data if the bucket is shared.

  7. (Optional) Add a comment.

  8. (Optional) Click Advanced options and enable Fallback mode.

    Fallback mode is intended for legacy workload migration scenarios. See Enable fallback mode on external locations.

  9. (Optional) If the S3 bucket requires SSE encryption, you can configure an encryption algorithm to allow external tables and volumes in Unity Catalog to access data in your S3 bucket.

    If your workspace storage bucket, which includes DBFS root, is encrypted using a Databricks encryption keys configuration, you can use the same key to enable encryption for the new external location. For instructions, see Configure an encryption algorithm on an external location.

  10. Click Create.

  11. Go to the Permissions tab to grant permission to use the external location.

    1. Click Grant.

    2. Select users, groups, or service principals in Principals field, and select the privilege you want to grant.

    3. Click Grant.

  12. (Optional) Set the workspaces that can access this external location.

    By default, users on any workspace that uses this Unity Catalog metastore can be granted access to the data in this location. You can limit that access to specific workspaces. Databricks recommends limiting access to the workspace that the DBFS root is in.

    See Bind an external location to one or more workspaces.