Databricks File System - DBFS

Databricks File System (DBFS) is a distributed file system installed on Databricks clusters. Files in DBFS persist to S3, so you won’t lose data even after you terminate a cluster. DBFS allows you to mount S3 buckets so that you can seamlessly access data without requiring credentials.

You can access files in DBFS using the Databricks CLI, DBFS API, Databricks Utilities, Spark APIs, and local file APIs. In a Spark cluster, you access DBFS using Databricks Utilities, Spark APIs, or local file APIs. On your local computer, you access DBFS using the Databricks CLI or DBFS API.

DBFS used to use an S3 bucket created in the Databricks account to store data that is not stored on a DBFS mount point. If your Databricks workspace still uses this S3 bucket, we recommend that you contact Databricks support to have the data moved to an S3 bucket in your own account.

Access DBFS with the Databricks CLI

The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:

# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana

For more information about the DBFS command-line interface, see Databricks CLI.

Access DBFS with dbutils

Databricks utilities provide file system like commands to access files in DBFS. This section has several examples of how to write files to and read files from DBFS using dbutils.


To access the help menu for DBFS, use the command.

  • Write file to and read files from DBFS as if it were a local filesystem.

    dbutils.fs.put("/foobar/baz.txt", "Hello, World!")
  • Use dbfs:/ to access a DBFS path.

  • Use file:/ to access the local disk."file:/foobar")
  • Filesystem cells provide a shorthand for accessing the dbutils filesystem module. Most dbutils.fs commands are available using the %fs magic command as well.

    # Recursively remove the files under foobar
    %fs rm -r foobar
    # Overwrite the file "/mnt/my-file" with the string "Hello world!"
    %fs put -f "/mnt/my-file" "Hello world!"

For more information about Databricks Utilities, see Databricks Utilities.

Access DBFS using Spark APIs


Access DBFS using local file APIs

You can use local file APIs to read and write to DBFS paths. Databricks configures each cluster node with a FUSE mount that allows processes running on cluster nodes to read and write to the underlying distributed storage layer with local file APIs. For example:

#write a file to DBFS using python i/o apis
with open("/dbfs/tmp/test_dbfs.txt", 'w') as f:
  f.write("Apache Spark is awesome!\n")
  f.write("End of example!")

# read the file
with open("/dbfs/tmp/test_dbfs.txt", "r") as f_read:
  for line in f_read:
    print line

val filename = "/dbfs/tmp/test_dbfs.txt"
for (line <- Source.fromFile(filename).getLines()) {


  • When you’re using Spark APIs, you reference files with "/mnt/training/file.csv" or "dbfs:/mnt/training/file.csv". If you’re using local file APIs, you must provide the path under /dbfs, for example: "/dbfs/mnt/training/file.csv". You cannot use a path under dbfs when using Spark APIs.

  • Local file I/O APIs only support files less than 2GB in size. If you use local file I/O APIs to read or write files larger than 2GB you might see corrupted files. Instead, access files larger than 2GB using the DBFS CLI, dbutils.fs, or Spark APIs.

  • If you write a file using the local file I/O APIs and then immediately try to access it using the DBFS CLI, dbutils.fs, or Spark APIs, you might encounter a FileNotFoundException, a file of size 0, or stale file contents. That is expected because the OS caches writes by default. To force those writes to be flushed to persistent storage (in our case DBFS), use the standard Unix system call sync.

    // scala
    import scala.sys.process._
    // Write a file using the local file API (over the FUSE mount).
    dbutils.fs.put("file:/dbfs/tmp/test", "test-contents")
    // Flush to persistent storage.
    "sync /dbfs/tmp/test" !
    // Read the file using "dbfs:/" instead of the FUSE mount.

Access DBFS using high performance local APIs

For distributed deep learning applications, which require DBFS access for loading, checkpointing, and logging data, Databricks Runtime ML provides a special DBFS folder that offers high-performance I/O for deep learning workloads.

In Databricks Runtime 5.4 ML and above, dbfs:/ml maps to file:/dbfs/ml on driver and worker nodes. Databricks recommends using Databricks Runtime 5.4 ML (or above) and saving data under /dbfs/ml.

For details, see Prepare Storage for Data Loading and Model Checkpointing.

Mount S3 buckets to DBFS

Mounting S3 buckets to DBFS allows you to access files as if they were on the local file system.

For information on how to mount and unmount AWS S3 buckets, see Access S3 with DBFS.