Databricks Utilities

Databricks Utilities (DBUtils) make it easy to perform powerful combinations of tasks. You can use the utilities to work with blob storage efficiently, to chain and parameterize notebooks, and to work with secrets.

All dbutils utilities are available in Python and Scala notebooks. Only Widget utilities are available in R notebooks; however, you can use a language magic command to invoke other dbutils methods in R and SQL notebooks. For example, to list the Databricks Datasets DBFS folder in an R or SQL notebook, run the command:


This topic includes the following sections:

File system utilities

The file system utilities access Databricks File System - DBFS, making it easier to use Databricks as a file system. Learn more by running:
cp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems
head(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8
ls(dir: String): Seq -> Lists the contents of a directory
mkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories
mv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems
put(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8
rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory

mount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point
mounts: Seq -> Displays information about what is mounted within DBFS
refreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information
unmount(mountPoint: String): boolean -> Deletes a DBFS mount point Command

The sequence returned by the ls command contains the following attributes:

Attribute Type Description
path string The path of the file or directory.
name string The name of the file or directory.
isDir() boolean True if the path is a directory.
size long/int64 The length of the file in bytes or zero if the path is a directory.


You can get detailed information about each command by using help, for example:"ls")

Notebook workflow utilities

Notebook workflows allow you to chain together notebooks and act on their results. See Notebook Workflows. Learn more by running:
exit(value: String): void -> This method lets you exit a notebook with a value
run(path: String, timeoutSeconds: int, arguments: Map): String -> This method runs a notebook and returns its exit value.


The maximum length of the string value returned from run is 5 MB. See Runs Get Output.


You can get detailed information about each command by using help, for example:"exit")

Widget utilities

Widgets allow you to parameterize notebooks. See Widgets. Learn more by running:
combobox(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a combobox input widget with a given name, default value and choices
dropdown(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a dropdown input widget a with given name, default value and choices
get(name: String): String -> Retrieves current value of an input widget
getArgument(name: String, optional: String): String -> (DEPRECATED) Equivalent to get
multiselect(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a multiselect input widget with a given name, default value and choices
remove(name: String): void -> Removes an input widget from the notebook
removeAll: void -> Removes all widgets in the notebook
text(name: String, defaultValue: String, label: String): void -> Creates a text input widget with a given name and default value


You can get detailed information about each command by using help, for example:"combobox")

Secrets utilities

Secrets allow you to store and access sensitive credential information without making them visible in notebooks. See Secrets and Use the secrets in a notebook. Learn more by running:


Secrets utilities are available on clusters running Databricks Runtime 4.0 and above.
get(scope: String, key: String): String -> Gets the string representation of a secret value with scope and key
getBytes(scope: String, key: String): byte[] -> Gets the bytes representation of a secret value with scope and key
list(scope: String): Seq -> Lists secret metadata for secrets within a scope
listScopes: Seq -> Lists secret scopes


You can get detailed information about each command by using help, for example:"get")

Library utilities


This feature is in Public Preview.


Library utilities are not available on Databricks Runtime for Machine Learning.

Library utilities allow you to install Python libraries and create an environment scoped to a notebook session. The libraries are available both on the driver and on the executors, so you can reference them in UDFs. This enables:

  • Library dependencies of a notebook to be organized within the notebook itself.
  • Notebook users with different library dependencies to share a cluster without interference.

Detaching a notebook destroys this environment – however, you can recreate it by re-running the library install API commands in the notebook. See the restartPython API for how you can reset your notebook state without losing your environment.

Library utilities are enabled by default on clusters running Databricks Runtime 5.1 and above. Therefore, by default the Python environment for each notebook is isolated by using a separate Python executable that is created when the notebook is attached to and inherits the default Python environment on the cluster. Libraries installed through an init script into the Databricks Python environment are still available. You can disable this feature by setting spark.databricks.libraryIsolation.enabled to false.

This API is designed to be preferred way to install libraries. It is compatible with the existing cluster-wide library installation through the UI and REST API. However, libraries installed through this API have higher priority than cluster-wide libraries.
install(path: String): boolean -> Install the library within the current notebook session
installPyPI(pypiPackage: String, version: String = "", repo: String = "", extras: String = ""): boolean -> Install the PyPI library within the current notebook session
list: List -> List the isolated libraries added for the current notebook session via dbutils
restartPython: void -> Restart python process for the current notebook session


You can get detailed information about each command by using help, for example:"install")


  • Install a .egg or .whl library in a notebook.

    The accepted library sources are dbfs and s3.

  • Install a PyPI library in a notebook. version, repo, and extras are optional. Use the extras argument to specify the Extras feature (extra requirements).

    dbutils.library.installPyPI("pypipackage", version="version", repo="repo", extras="extras")


    The version and extras keys cannot be part of the PyPI package string. For example: dbutils.library.installPyPI("azureml-sdk[databricks]==1.0.8") is not valid. Use the version and extras arguments to specify the version and extras information as follows:

    dbutils.library.installPyPI("azureml-sdk", version="1.0.8", extras="databricks")
  • Specify your library requirements in one notebook and install them through %run in the other.

    • Define the libraries to install in a notebook called InstallDependencies.

      dbutils.library.installPyPI("scikit-learn", version="1.19.1")
      dbutils.library.installPyPI("azureml-sdk", extras="databricks")
    • Install them in the notebook that needs those dependencies.

      %run /path/to/InstallDependencies    # install the dependencies in first cell
      import torch
      from sklearn.linear_model import LinearRegression
      import azureml
      # do the actual work
  • List the libraries installed in a notebook.

  • Reset the Python notebook state while maintaining the environment. This API is available only in Python notebooks. This can be used to:

    • Reload libraries Databricks preinstalled with a different version. For example:

      dbutils.library.installPyPI("numpy", version="1.15.4")
      # Make sure you start using the library in another cell.
      import numpy
    • Install libraries like tensorflow that need to be loaded on process start up. For example:

      # Use the library in another cell.
      import tensorflow


    The Python notebook state is reset after the notebook cell containing restartPython is run. After running the cell calling restartPython, the notebook loses all state held in the Python including but not limited to local variables, imported libraries, and other ephemeral states. Therefore, we recommended that you install libraries and reset the notebook state in the first notebook cell.

Databricks Utilities API library

To accelerate application development, it can be helpful to compile, build, and test applications before you deploy them as production jobs. To enable you to compile against Databricks Utilities, Databricks provides the dbutils-api library. You can download the dbutils-api library or include the library by adding a dependency to your build file:

  • SBT

    libraryDependencies += "com.databricks" % "dbutils-api_2.11" % "0.0.3"
  • Maven

  • Gradle

    compile 'com.databricks:dbutils-api_2.11:0.0.3'

Once you build your application against this library, you can deploy the application on a cluster running Databricks Runtime 4.0 or above.


The dbutils-api library allows you to locally compile an application that uses dbutils, but not to run it. To run the application, you must deploy it in Databricks.

Example projects

Here is an example archive containing minimal example projects that show you how to compile using the dbutils-api library for 3 common build tools:

  • sbt: sbt package
  • Maven: mvn install
  • Gradle: gradle build

These commands create output JARs in the locations:

  • sbt: target/scala-2.11/dbutils-api-example_2.11-0.0.1-SNAPSHOT.jar
  • Maven: target/dbutils-api-example-0.0.1-SNAPSHOT.jar
  • Gradle: build/libs/dbutils-api-example-0.0.1-SNAPSHOT.jar

You can attach this JAR to your cluster as a library, restart the cluster, (which you must do using Databricks Runtime 4.0), and then run:


This statement creates a text input widget with the label Hello: and the initial value World.

You can use all the other dbutils APIs the same way.

To test an application that uses the dbutils object outside Databricks, you can mock up the dbutils object by calling:

  new com.databricks.dbutils_v1.DBUtilsV1{

Substitute your own DBUtilsV1 instance in which you implement the interface methods however you like, for example providing a local filesystem mockup for dbutils.fs.