What is the Databricks File System (DBFS)?
The Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage that maps Unix-like filesystem calls to native cloud storage API calls.
Databricks workspaces deploy with a DBFS root volume, accessible to all users by default. Databricks recommends against storing production data in this location.
What can you do with DBFS?
DBFS provides convenience by mapping cloud object storage URIs to relative paths.
Allows you to interact with object storage using directory and file semantics instead of cloud-specific API commands.
Allows you to mount cloud object storage locations so that you can map storage credentials to paths in the Databricks workspace.
Simplifies the process of persisting files to object storage, allowing virtual machines and attached volume storage to be safely deleted on cluster termination.
Provides a convenient location for storing init scripts, JARs, libraries, and configurations for cluster initialization.
Provides a convenient location for checkpoint files created during model training with OSS deep learning libraries.
DBFS is the Databricks implementation for FUSE. See How to work with files on Databricks.
Interact with files in cloud-based object storage
DBFS provides many options for interacting with files in cloud object storage:
Mount object storage
Mounting object storage to DBFS allows you to access objects in object storage as if they were on the local file system. Mounts store Hadoop configurations necessary for accessing storage, so you do not need to specify these settings in code or during cluster configuration.
For more information, see Mounting cloud object storage on Databricks.
What is the DBFS root?
The DBFS root is the default storage location for a Databricks workspace, provisioned as part of workspace creation in the cloud account containing the Databricks workspace. For details on Databricks Filesystem root configuration and deployment, see Configure AWS storage. For best practices around securing data in the DBFS root, see Recommendations for working with DBFS root.
Some users of Databricks may refer to the DBFS root as “DBFS” or “the DBFS”; it is important to differentiate that DBFS is a file system used for interacting with data in cloud object storage, and the DBFS root is a cloud object storage location. You use DBFS to interact with the DBFS root, but they are distinct concepts, and DBFS has many applications beyond the DBFS root.
The DBFS root contains a number of special locations that serve as defaults for various actions performed by users in the workspace. For details, see What directories are in DBFS root by default?.
How does DBFS work with Unity Catalog?
Unity Catalog adds the concepts of external locations and managed storage credentials to help organizations provide least privileges access to data in cloud object storage. Unity Catalog also provides a new default storage location for managed tables. Some security configurations provide direct access to both Unity Catalog-managed resources and DBFS. Databricks has compiled recommendations for using DBFS and Unity Catalog.