Databricks File System (DBFS)
Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage and offers the following benefits:
- Allows you to mount storage objects so that you can seamlessly access data without requiring credentials.
- Allows you to interact with object storage using directory and file semantics instead of storage URLs.
- Persists files to object storage, so you won’t lose data after you terminate a cluster.
DBFS root
The default storage location in DBFS is known as the DBFS root. Several types of data are stored in the following DBFS root locations:
/FileStore
: Imported data files, generated plots, and uploaded libraries. See Special DBFS root locations./databricks-datasets
: Sample public datasets. See Special DBFS root locations./databricks-results
: Files generated by downloading the full results of a query./databricks/init
: Global and cluster-named (deprecated) init scripts./user/hive/warehouse
: Data and metadata for non-external Hive tables.
In a new workspace, the DBFS root has the following default folders:

The DBFS root also contains data—including mount point metadata and credentials and certain types of logs—that is not visible and cannot be directly accessed.
Important
Data written to mount point paths (/mnt
) is stored outside of the DBFS root. Even though the DBFS root is writeable, we recommend that you store data in mounted object storage rather than in the DBFS root.
Note
For some time DBFS used an S3 bucket in the Databricks account to store data that is not stored on a DBFS mount point. If your Databricks workspace still uses this S3 bucket, we recommend that you contact Databricks support to have the data moved to an S3 bucket in your own account.
Browse DBFS using the UI
You can browse and search for DBFS objects using the DBFS file browser.
Note
An admin user must enable the DBFS browser interface before you can use it. See Manage the DBFS file browser.
- Click
in the sidebar.
- Click the DBFS button at the top of the page.
The browser displays DBFS objects in a hierarchy of vertical swimlanes. Select an object to expand the hierarchy. Use Prefix search in any swimlane to find a DBFS object.

You can also list DBFS objects using the DBFS CLI, DBFS API, Databricks file system utilities (dbutils.fs), Spark APIs, and local file APIs. See Access DBFS.
Mount object storage to DBFS
Mounting object storage to DBFS allows you to access objects in object storage as if they were on the local file system.
For information on how to mount and unmount AWS S3 buckets, see Access S3 buckets through DBFS. For information on encrypting data when writing to S3 through DBFS, see Encrypt data in S3 buckets.
For information on how to mount and unmount Azure Blob storage containers and Azure Data Lake Storage accounts, see Mount Azure Blob storage containers to DBFS, Mount Azure Data Lake Storage Gen1 resource using a service principal and OAuth 2.0, and Mount ADLS Gen2 storage.
Important
All users have read and write access to the objects in object storage mounted to DBFS.
However, if a mount is created using instance profiles, users only have the access that the IAM role allows and only from clusters configured to use that instance profile. This also means that mounts created using instance profiles are not accessible through the DBFS CLI.
Nested mounts are not supported. For example, the following structure is not supported:
storage1
mounted as/mnt/storage1
storage2
mounted as/mnt/storage1/storage2
We recommend creating separate mount entries for each storage object:
storage1
mounted as/mnt/storage1
storage2
mounted as/mnt/storage2
Access DBFS
You can upload data to DBFS using the file upload interface, and can upload and access DBFS objects using the DBFS CLI, DBFS API, Databricks file system utilities (dbutils.fs), Spark APIs, and local file APIs.
In a Databricks cluster you access DBFS objects using Databricks file system utilities, Spark APIs, or local file APIs. On a local computer you access DBFS objects using the Databricks CLI or DBFS API.
In this section:
DBFS and local driver node paths
You can work with files on DBFS or on the local driver node of the cluster. You can access the file system using magic commands such as %fs
or %sh
. You can also use Databricks file system utilities (dbutils.fs).
Databricks uses a FUSE mount to provide local access to files stored in the cloud. A FUSE mount is a secure, virtual filesystem.
Access files on DBFS
The path to the default blog storage (root) is dbfs:/
.
The default location for %fs
and dbutils.fs
is root. Thus, to read from or write to root or an external bucket:
%fs <command> /<path>
dbutils.fs.<command> ("/<path>/")
%sh
reads from the local filesystem by default. To access root or mounted paths in root with %sh
, preface the path with /dbfs/
. A typical use case is if you are working with single node libraries like TensorFlow or scikit-learn and want to read and write data to cloud storage.
%sh <command> /dbfs/<path>/
You can also use single-node filesystem APIs:
import os
os.<command>('/dbfs/tmp')
Examples
# Default location for %fs is root
%fs ls /tmp/
%fs mkdirs /tmp/my_cloud_dir
%fs cp /tmp/test_dbfs.txt /tmp/file_b.txt
# Default location for dbutils.fs is root
dbutils.fs.ls ("/tmp/")
dbutils.fs.put("/tmp/my_new_file", "This is a file in cloud storage.")
# Default location for %sh is the local filesystem
%sh ls /dbfs/tmp/
# Default location for os commands is the local filesystem
import os
os.listdir('/dbfs/tmp')
Access files on the local filesystem
%fs
and dbutils.fs
read by default from root (dbfs:/
). To read from the local filesystem, you must use file:/
.
%fs <command> file:/<path>
dbutils.fs.<command> ("file:/<path>/")
%sh
reads from the local filesystem by default, so do not use file:/
:
%sh <command> /<path>
Examples
# With %fs and dbutils.fs, you must use file:/ to read from local filesystem
%fs ls file:/tmp
%fs mkdirs file:/tmp/my_local_dir
dbutils.fs.ls ("file:/tmp/")
dbutils.fs.put("file:/tmp/my_new_file", "This is a file on the local driver node.")
# %sh reads from the local filesystem by default
%sh ls /tmp
Access files on mounted object storage
Mounting object storage to DBFS allows you to access objects in object storage as if they were on the local file system.
Summary table and diagram
The table and diagram summarize and illustrate the commands described in this section and when to use each syntax.
Command | Default location | To read from root | To read from local filesystem |
---|---|---|---|
%fs |
Root | Add file:/ to path |
|
%sh |
Local driver node | Add /dbfs to path |
|
dbutils.fs |
Root | Add file:/ to path |
|
os.<command> |
Local driver node | Add /dbfs to path |

File upload interface
If you have small data files on your local machine that you want to analyze with Databricks, you can easily import them to Databricks File System (DBFS) using one of the two file upload interfaces: from the DBFS file browser or from a notebook.
Files are uploaded to the FileStore directory.
Upload data to DBFS from the file browser
Note
This feature is disabled by default. An administrator must enable the DBFS browser interface before you can use it. See Manage the DBFS file browser.
Click
in the sidebar.
Click the DBFS button at the top of the page.
Click the Upload button at the top of the page.
On the Upload Data to DBFS dialog, optionally select a target directory or enter a new one.
In the Files box, drag and drop or use the file browser to select the local file to upload.
Uploaded files are accessible by everyone who has access to the workspace.
Upload data to DBFS from a notebook
Note
This feature is enabled by default. If an administrator has disabled this feature, you will not have the option to upload files.
To create a table using the UI, see Create a table using the UI.
To upload data for use in a notebook, follow these steps.
Create a new notebook or open an existing one, then click File > Upload Data
Select a target directory in DBFS to store the uploaded file. The target directory defaults to
/shared_uploads/<your-email-address>/
.Uploaded files are accessible by everyone who has access to the workspace.
Either drag files onto the drop target or click Browse to locate files in your local filesystem.
When you have finished uploading the files, click Next.
If you’ve uploaded CSV, TSV, or JSON files, Databricks generates code showing how to load the data into a DataFrame.
To save the text to your clipboard, click Copy.
Click Done to return to the notebook.
Databricks CLI
The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:
# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana
For more information about the DBFS command-line interface, see Databricks CLI.
dbutils
dbutils.fs provides file-system-like commands to access files in DBFS.
This section has several examples of how to write files to and read files from DBFS using dbutils.fs
commands.
Tip
To access the help menu for DBFS, use the dbutils.fs.help()
command.
Write files to and read files from the DBFS root as if it were a local filesystem.
dbutils.fs.mkdirs("/foobar/")
dbutils.fs.put("/foobar/baz.txt", "Hello, World!")
dbutils.fs.head("/foobar/baz.txt")
dbutils.fs.rm("/foobar/baz.txt")
Use
dbfs:/
to access a DBFS path.display(dbutils.fs.ls("dbfs:/foobar"))
Notebooks support a shorthand—
%fs
magic commands—for accessing thedbutils
filesystem module. Mostdbutils.fs
commands are available using%fs
magic commands.# List the DBFS root %fs ls # Recursively remove the files under foobar %fs rm -r foobar # Overwrite the file "/mnt/my-file" with the string "Hello world!" %fs put -f "/mnt/my-file" "Hello world!"
DBFS API
See DBFS API and Upload a big file into DBFS.
Spark APIs
When you’re using Spark APIs, you reference files with "/mnt/training/file.csv"
or "dbfs:/mnt/training/file.csv"
. The following example writes the file foo.text
to the DBFS /tmp
directory.
df.write.text("/tmp/foo.txt")
Local file APIs
You can use local file APIs to read and write to DBFS paths. Databricks configures each cluster node with a FUSE mount /dbfs
that allows processes running on cluster nodes to read and write to the underlying distributed storage layer with local file APIs. When using local file APIs, you must provide the path under /dbfs
. For example:
#write a file to DBFS using Python I/O APIs
with open("/dbfs/tmp/test_dbfs.txt", 'w') as f:
f.write("Apache Spark is awesome!\n")
f.write("End of example!")
# read the file
with open("/dbfs/tmp/test_dbfs.txt", "r") as f_read:
for line in f_read:
print(line)
import scala.io.Source
val filename = "/dbfs/tmp/test_dbfs.txt"
for (line <- Source.fromFile(filename).getLines()) {
println(line)
}
Local file API limitations
The following enumerates the limitations in local file API usage that apply to each FUSE and respective Databricks Runtime versions.
All: Do not support AWS S3 mounts with client-side encryption enabled.
FUSE V2 (default for Databricks Runtime 6.x and 7.x)
Does not support random writes. For workloads that require random writes, perform the I/O on local disk first and then copy the result to
/dbfs
. For example:# python import xlsxwriter from shutil import copyfile workbook = xlsxwriter.Workbook('/local_disk0/tmp/excel.xlsx') worksheet = workbook.add_worksheet() worksheet.write(0, 0, "Key") worksheet.write(0, 1, "Value") workbook.close() copyfile('/local_disk0/tmp/excel.xlsx', '/dbfs/tmp/excel.xlsx')
Does not support sparse files. To copy sparse files, use
cp --sparse=never
:$ cp sparse.file /dbfs/sparse.file error writing '/dbfs/sparse.file': Operation not supported $ cp --sparse=never sparse.file /dbfs/sparse.file
FUSE V1 (default for Databricks Runtime 5.5 LTS)
Important
If you experience issues with FUSE V1 on
<DBR>
5.5 LTS, Databricks recommends that you use FUSE V2 instead. You can override the default FUSE version in<DBR>
5.5 LTS by setting the environment variableDBFS_FUSE_VERSION=2
.Supports only files less than 2GB in size. If you use local file I/O APIs to read or write files larger than 2GB you might see corrupted files. Instead, access files larger than 2GB using the DBFS CLI, dbutils.fs, or Spark APIs or use the
/dbfs/ml
folder described in Local file APIs for deep learning.If you write a file using the local file I/O APIs and then immediately try to access it using the DBFS CLI, dbutils.fs, or Spark APIs, you might encounter a
FileNotFoundException
, a file of size 0, or stale file contents. That is expected because the OS caches writes by default. To force those writes to be flushed to persistent storage (in our case DBFS), use the standard Unix system call sync). For example:// scala import scala.sys.process._ // Write a file using the local file API (over the FUSE mount). dbutils.fs.put("file:/dbfs/tmp/test", "test-contents") // Flush to persistent storage. "sync /dbfs/tmp/test" ! // Read the file using "dbfs:/" instead of the FUSE mount. dbutils.fs.head("dbfs:/tmp/test")
Local file APIs for deep learning
For distributed deep learning applications, which require DBFS access for loading, checkpointing, and logging data, Databricks Runtime 6.0 and above provide a high-performance /dbfs
mount that’s optimized for deep learning workloads.
In Databricks Runtime 5.5 LTS, only /dbfs/ml
is optimized. In this version Databricks recommends saving data under /dbfs/ml
, which maps to dbfs:/ml
.