Databricks File System - DBFS

The Databricks File System or DBFS is a distributed file system that comes installed on Databricks Runtime Clusters in Databricks.

DBFS is a layer over S3, which allows you to mount S3 buckets to make them available to users in your workspace

The Databricks File System is available in both Python and Scala. By default, DBFS uses an S3 bucket created in the Databricks account to store data that is not stored on a DBFS mount point. Databricks can switch this over to an S3 bucket in your own account at your request. Mounting other S3 buckets in DBFS gives your Databricks users access to specific data without requiring them to have your S3 keys. In addition,

  • Files in DBFS persist to S3, so you won’t lose data even after you terminate the clusters.
  • dbutils makes it easy for you to use DBFS and is automatically available (no import necessary) in every Databricks notebook.

You can access DBFS through Databricks Utilities - dbutils with a Spark Cluster or using the DBFS Command Line Interface on your local computer.

DBFS Command Line Interface

The DBFS command line interface leverages the DBFS API to expose a easy to use command line interface to DBFS. Using this client, interacting with DBFS is as easy as running.

# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana

For more information about the DBFS command line interface reference Databricks CLI.

Saving Files to DBFS with dbutils

Read and write files to DBFS as if it were a local filesystem.

dbutils.fs.put("/foobar/baz.txt", "Hello, World!")

Use Spark to write to DBFS

sc.parallelize(range(0, 100)).saveAsTextFile("/tmp/foo.txt")
sc.parallelize(0 until 100).saveAsTextFile("/tmp/bar.txt")

Use nothing or dbfs:/ to access a DBFS path.


Use file:/ to access the local disk"file:/foobar")

Filesystem cells provide a shorthand for accessing the dbutils filesystem module. Most dbutils.fs are available via the %fs magic command as well.

%fs rm -r foobar

Using Local File I/O APIs

Users can use local APIs to read and write to DBFS paths. Databricks configures each node with a fuse mount that allows processes to read / write to the underlying distributed storage layer.

# write a file to DBFS using python i/o apis
with open("/dbfs/tmp/test_dbfs.txt", 'w') as f:
  f.write("Apache Spark is awesome!\n")
  f.write("End of example!")

# read the file
with open("/dbfs/tmp/test_dbfs.txt", "r") as f_read:
  for line in f_read:
    print line
// scala

val filename = "/dbfs/tmp/test_dbfs.txt"
for (line <- Source.fromFile(filename).getLines()) {


Local File I/O APIs only support files less than 2GB in size. You might see corrupted files if you use Local File I/0 APIs to read or write files larger than 2GB. Please use DBFS Command Line Interface, dbutils.fs or Hadoop FileSystem APIs to access large files instead.

Mounting an S3 Bucket

Mounting an S3 directly to DBFS allows you to access files in S3 as if it were on the local file system.


One common issue is to pick bucket names that are not valid URIs. S3 bucket name limitations:

We recommend Secure Access to S3 Buckets using IAM Roles for mounting your buckets. IAM roles allow you to mount a bucket as a path. You can also mount a bucket using keys, although we do not recommend doing so.

Replace the values in the following cell with your S3 credentials.

# python
# Encode the Secret Key as that can contain "/"
SECRET_KEY = "YOUR_SECRET_KEY".replace("/", "%2F")

dbutils.fs.mount("s3a://%s:%s@%s" % (ACCESS_KEY, SECRET_KEY, AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)
display("/mnt/%s" % MOUNT_NAME))
// scala
// Replace with your values
val AccessKey = "YOUR_ACCESS_KEY"
// Encode the Secret Key as that can contain "/"
val SecretKey = "YOUR_SECRET_KEY".replace("/", "%2F")
val AwsBucketName = "MY_BUCKET"
val MountName = "MOUNT_NAME"

dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey@$AwsBucketName", s"/mnt/$MountName")

Now you can access files in your S3 bucket as if they were local files, for example:

rdd = sc.textFile("/mnt/%s/...." % MOUNT_NAME)
rdd = sc.textFile("dbfs:/$MountName/....")
val rdd = sc.textFile(s"/$MountName/....")
val rdd = sc.textFile(s"dbfs:/$MountName/....")

Note: You can use the fuse mounts to access mounted S3 buckets by referring to /dbfs/mnt/myMount/.

Getting Help

Use the command anytime to access the help menu for DBFS.