Databricks Utilities (dbutils
) reference
This article is a reference for Databricks Utilities (dbutils
). dbutils
utilities are available in Python, R, and Scala notebooks. You can use the utilities to:
Work with files and object storage efficiently.
Work with secrets.
Note
dbutils
only supports compute environments that use DBFS.
How to: List utilities, list commands, display command help
Utilities: credentials, data, fs, jobs, library, notebook, secrets, widgets, Utilities API library
List available utilities
To list available utilities along with a short description for each utility, run dbutils.help()
for Python or Scala.
This example lists available commands for the Databricks Utilities.
dbutils.help()
dbutils.help()
This module provides various utilities for users to interact with the rest of Databricks.
credentials: DatabricksCredentialUtils -> Utilities for interacting with credentials within notebooks
data: DataUtils -> Utilities for understanding and interacting with datasets (EXPERIMENTAL)
fs: DbfsUtils -> Manipulates the Databricks filesystem (DBFS) from the console
jobs: JobsUtils -> Utilities for leveraging jobs features
library: LibraryUtils -> Utilities for session isolated libraries
meta: MetaUtils -> Methods to hook into the compiler (EXPERIMENTAL)
notebook: NotebookUtils -> Utilities for the control flow of a notebook (EXPERIMENTAL)
preview: Preview -> Utilities under preview category
secrets: SecretUtils -> Provides utilities for leveraging secrets within notebooks
widgets: WidgetsUtils -> Methods to create and get bound value of input widgets inside notebooks
List available commands for a utility
To list available commands for a utility along with a short description of each command, run .help()
after the programmatic name for the utility.
This example lists available commands for the Databricks File System (DBFS) utility.
dbutils.fs.help()
dbutils.fs.help()
dbutils.fs.help()
dbutils.fs provides utilities for working with FileSystems. Most methods in this package can take either a DBFS path (e.g., "/foo" or "dbfs:/foo"), or another FileSystem URI. For more info about a method, use dbutils.fs.help("methodName"). In notebooks, you can also use the %fs shorthand to access DBFS. The %fs shorthand maps straightforwardly onto dbutils calls. For example, "%fs head --maxBytes=10000 /file/path" translates into "dbutils.fs.head("/file/path", maxBytes = 10000)".
fsutils
cp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems
head(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8
ls(dir: String): Seq -> Lists the contents of a directory
mkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories
mv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems
put(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8
rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory
mount
mount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point
mounts: Seq -> Displays information about what is mounted within DBFS
refreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information
unmount(mountPoint: String): boolean -> Deletes a DBFS mount point
updateMount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Similar to mount(), but updates an existing mount point instead of creating a new one
Display help for a command
To display help for a command, run .help("<command-name>")
after the command name.
This example displays help for the DBFS copy command.
dbutils.fs.help("cp")
dbutils.fs.help("cp")
dbutils.fs.help("cp")
/**
* Copies a file or directory, possibly across FileSystems.
*
* Example: cp("/mnt/my-folder/a", "dbfs:/a/b")
*
* @param from FileSystem URI of the source file or directory
* @param to FileSystem URI of the destination file or directory
* @param recurse if true, all files and directories will be recursively copied
* @return true if all files were successfully copied
*/
cp(from: java.lang.String, to: java.lang.String, recurse: boolean = false): boolean
Credentials utility (dbutils.credentials)
Commands: assumeRole, showCurrentRole, showRoles
The credentials utility allows you to interact with credentials within notebooks. This utility is usable only on clusters with credential passthrough enabled. To list the available commands, run dbutils.credentials.help()
.
assumeRole(role: String): boolean -> Sets the role ARN to assume when looking for credentials to authenticate with S3
showCurrentRole: List -> Shows the currently set role
showRoles: List -> Shows the set of possible assumed roles
assumeRole command (dbutils.credentials.assumeRole)
Sets the Amazon Resource Name (ARN) for the AWS Identity and Access Management (IAM) role to assume when looking for credentials to authenticate with Amazon S3. After you run this command, you can run S3 access commands, such as sc.textFile("s3a://my-bucket/my-file.csv")
to access an object.
To display help for this command, run dbutils.credentials.help("assumeRole")
.
dbutils.credentials.assumeRole("arn:aws:iam::123456789012:roles/my-role")
# Out[1]: True
dbutils.credentials.assumeRole("arn:aws:iam::123456789012:roles/my-role")
# TRUE
dbutils.credentials.assumeRole("arn:aws:iam::123456789012:roles/my-role")
// res0: Boolean = true
showCurrentRole command (dbutils.credentials.showCurrentRole)
Lists the currently set AWS Identity and Access Management (IAM) role.
To display help for this command, run dbutils.credentials.help("showCurrentRole")
.
dbutils.credentials.showCurrentRole()
# Out[1]: ['arn:aws:iam::123456789012:role/my-role-a']
dbutils.credentials.showCurrentRole()
# [[1]]
# [1] "arn:aws:iam::123456789012:role/my-role-a"
dbutils.credentials.showCurrentRole()
// res0: java.util.List[String] = [arn:aws:iam::123456789012:role/my-role-a]
showRoles command (dbutils.credentials.showRoles)
Lists the set of possible assumed AWS Identity and Access Management (IAM) roles.
To display help for this command, run dbutils.credentials.help("showRoles")
.
dbutils.credentials.showRoles()
# Out[1]: ['arn:aws:iam::123456789012:role/my-role-a', 'arn:aws:iam::123456789012:role/my-role-b']
dbutils.credentials.showRoles()
# [[1]]
# [1] "arn:aws:iam::123456789012:role/my-role-a"
#
# [[2]]
# [1] "arn:aws:iam::123456789012:role/my-role-b"
dbutils.credentials.showRoles()
// res0: java.util.List[String] = [arn:aws:iam::123456789012:role/my-role-a, arn:aws:iam::123456789012:role/my-role-b]
Data utility (dbutils.data)
Preview
This feature is in Public Preview.
Note
Available in Databricks Runtime 9.0 and above.
Commands: summarize
The data utility allows you to understand and interpret datasets. To list the available commands, run dbutils.data.help()
.
dbutils.data provides utilities for understanding and interpreting datasets. This module is currently in preview and may be unstable. For more info about a method, use dbutils.data.help("methodName").
summarize(df: Object, precise: boolean): void -> Summarize a Spark DataFrame and visualize the statistics to get quick insights
summarize command (dbutils.data.summarize)
Calculates and displays summary statistics of an Apache Spark DataFrame or pandas DataFrame. This command is available for Python, Scala and R.
Caution
This command analyzes the complete contents of the DataFrame. Running this command for very large DataFrames can be very expensive.
To display help for this command, run dbutils.data.help("summarize")
.
In Databricks Runtime 10.4 LTS and above, you can use the additional precise
parameter to adjust the precision of the computed statistics.
Note
This feature is in Public Preview.
When
precise
is set to false (the default), some returned statistics include approximations to reduce run time.The number of distinct values for categorical columns may have ~5% relative error for high-cardinality columns.
The frequent value counts may have an error of up to 0.01% when the number of distinct values is greater than 10000.
The histograms and percentile estimates may have an error of up to 0.01% relative to the total number of rows.
When
precise
is set to true, the statistics are computed with higher precision. All statistics except for the histograms and percentiles for numeric columns are now exact.The histograms and percentile estimates may have an error of up to 0.0001% relative to the total number of rows.
The tooltip at the top of the data summary output indicates the mode of the current run.
This example displays summary statistics for an Apache Spark DataFrame with approximations enabled by default. To see the results, run this command in a notebook. This example is based on Sample datasets.
df = spark.read.format('csv').load(
'/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv',
header=True,
inferSchema=True
)
dbutils.data.summarize(df)
df <- read.df("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", source = "csv", header="true", inferSchema = "true")
dbutils.data.summarize(df)
val df = spark.read.format("csv")
.option("inferSchema", "true")
.option("header", "true")
.load("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv")
dbutils.data.summarize(df)
The visualization uses SI notation to concisely render numerical values smaller than 0.01 or larger than 10000. As an example, the numerical value 1.25e-15
will be rendered as 1.25f
. One exception: the visualization uses “B
” for 1.0e9
(giga) instead of “G
”.
File system utility (dbutils.fs)
Warning
The Python implementation of all dbutils.fs
methods uses snake_case
rather than camelCase
for keyword formatting.
For example, dbutils.fs.help()
displays the option extraConfigs
for dbutils.fs.mount()
. However, in Python you would use the keyword extra_configs
.
Commands: cp, head, ls, mkdirs, mount, mounts, mv, put, refreshMounts, rm, unmount, updateMount
The file system utility allows you to access What is DBFS?, making it easier to use Databricks as a file system.
In notebooks, you can use the %fs
magic command to access DBFS. For example, %fs ls /Volumes/main/default/my-volume/
is the same as duties.fs.ls("/Volumes/main/default/my-volume/")
. See magic commands.
To list the available commands, run dbutils.fs.help()
.
dbutils.fs provides utilities for working with FileSystems. Most methods in this package can take either a DBFS path (e.g., "/foo" or "dbfs:/foo"), or another FileSystem URI. For more info about a method, use dbutils.fs.help("methodName"). In notebooks, you can also use the %fs shorthand to access DBFS. The %fs shorthand maps straightforwardly onto dbutils calls. For example, "%fs head --maxBytes=10000 /file/path" translates into "dbutils.fs.head("/file/path", maxBytes = 10000)".
fsutils
cp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems
head(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8
ls(dir: String): Seq -> Lists the contents of a directory
mkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories
mv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems
put(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8
rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory
mount
mount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point
mounts: Seq -> Displays information about what is mounted within DBFS
refreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information
unmount(mountPoint: String): boolean -> Deletes a DBFS mount point
updateMount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Similar to mount(), but updates an existing mount point instead of creating a new one
cp command (dbutils.fs.cp)
Copies a file or directory, possibly across filesystems.
To display help for this command, run dbutils.fs.help("cp")
.
This example copies the file named data.csv
from /Volumes/main/default/my-volume/
to new-data.csv
in the same volume.
dbutils.fs.cp("/Volumes/main/default/my-volume/data.csv", "/Volumes/main/default/my-volume/new-data.csv")
# Out[4]: True
dbutils.fs.cp("/Volumes/main/default/my-volume/data.csv", "/Volumes/main/default/my-volume/new-data.csv")
# [1] TRUE
dbutils.fs.cp("/Volumes/main/default/my-volume/data.csv", "/Volumes/main/default/my-volume/new-data.csv")
// res3: Boolean = true
head command (dbutils.fs.head)
Returns up to the specified maximum number of bytes in the given file. The bytes are returned as a UTF-8 encoded string.
To display help for this command, run dbutils.fs.help("head")
.
This example displays the first 25 bytes of the file data.csv
located in /Volumes/main/default/my-volume/
.
dbutils.fs.head("/Volumes/main/default/my-volume/data.csv", 25)
# [Truncated to first 25 bytes]
# Out[12]: 'Year,First Name,County,Se'
dbutils.fs.head("/Volumes/main/default/my-volume/data.csv", 25)
# [1] "Year,First Name,County,Se"
dbutils.fs.head("/Volumes/main/default/my-volume/data.csv", 25)
// [Truncated to first 25 bytes]
// res4: String =
// "Year,First Name,County,Se"
ls command (dbutils.fs.ls)
Lists the contents of a directory.
To display help for this command, run dbutils.fs.help("ls")
.
This example displays information about the contents of /Volumes/main/default/my-volume/
. The modificationTime
field is available in Databricks Runtime 10.4 LTS and above. In R, modificationTime
is returned as a string.
dbutils.fs.ls("/Volumes/main/default/my-volume/")
# Out[13]: [FileInfo(path='dbfs:/Volumes/main/default/my-volume/data.csv', name='data.csv', size=2258987, modificationTime=1711357839000)]
dbutils.fs.ls("/Volumes/main/default/my-volume/")
# For prettier results from dbutils.fs.ls(<dir>), please use `%fs ls <dir>`
# [[1]]
# [[1]]$path
# [1] "/Volumes/main/default/my-volume/data.csv"
# [[1]]$name
# [1] "data.csv"
# [[1]]$size
# [1] 2258987
# [[1]]$isDir
# [1] FALSE
# [[1]]$isFile
# [1] TRUE
# [[1]]$modificationTime
# [1] "1711357839000"
dbutils.fs.ls("/tmp")
// res6: Seq[com.databricks.backend.daemon.dbutils.FileInfo] = WrappedArray(FileInfo(/Volumes/main/default/my-volume/data.csv, 2258987, 1711357839000))
mkdirs command (dbutils.fs.mkdirs)
Creates the given directory if it does not exist. Also creates any necessary parent directories.
To display help for this command, run dbutils.fs.help("mkdirs")
.
This example creates the directory my-data
within /Volumes/main/default/my-volume/
.
dbutils.fs.mkdirs("/Volumes/main/default/my-volume/my-data")
# Out[15]: True
dbutils.fs.mkdirs("/Volumes/main/default/my-volume/my-data")
# [1] TRUE
dbutils.fs.mkdirs("/Volumes/main/default/my-volume/my-data")
// res7: Boolean = true
mount command (dbutils.fs.mount)
Mounts the specified source directory into DBFS at the specified mount point.
To display help for this command, run dbutils.fs.help("mount")
.
aws_bucket_name = "my-bucket"
mount_name = "s3-my-bucket"
dbutils.fs.mount("s3a://%s" % aws_bucket_name, "/mnt/%s" % mount_name)
val AwsBucketName = "my-bucket"
val MountName = "s3-my-bucket"
dbutils.fs.mount(s"s3a://$AwsBucketName", s"/mnt/$MountName")
For additional code examples, see Connect to Amazon S3.
mounts command (dbutils.fs.mounts)
Displays information about what is currently mounted within DBFS.
To display help for this command, run dbutils.fs.help("mounts")
.
Warning
Call dbutils.fs.refreshMounts()
on all other running clusters to propagate the new mount. See refreshMounts command (dbutils.fs.refreshMounts).
dbutils.fs.mounts()
# Out[11]: [MountInfo(mountPoint='/mnt/databricks-results', source='databricks-results', encryptionType='sse-s3')]
dbutils.fs.mounts()
For additional code examples, see Connect to Amazon S3.
mv command (dbutils.fs.mv)
Moves a file or directory, possibly across filesystems. A move is a copy followed by a delete, even for moves within filesystems.
To display help for this command, run dbutils.fs.help("mv")
.
This example moves the file rows.csv
from /Volumes/main/default/my-volume/
to /Volumes/main/default/my-volume/my-data/
.
dbutils.fs.mv("/Volumes/main/default/my-volume/rows.csv", "/Volumes/main/default/my-volume/my-data/")
# Out[2]: True
dbutils.fs.mv("/Volumes/main/default/my-volume/rows.csv", "/Volumes/main/default/my-volume/my-data/")
# [1] TRUE
dbutils.fs.mv("/Volumes/main/default/my-volume/rows.csv", "/Volumes/main/default/my-volume/my-data/")
// res1: Boolean = true
put command (dbutils.fs.put)
Writes the specified string to a file. The string is UTF-8 encoded.
To display help for this command, run dbutils.fs.help("put")
.
This example writes the string Hello, Databricks!
to a file named hello.txt
in /Volumes/main/default/my-volume/
. If the file exists, it will be overwritten.
dbutils.fs.put("/Volumes/main/default/my-volume/hello.txt", "Hello, Databricks!", True)
# Wrote 2258987 bytes.
# Out[6]: True
dbutils.fs.put("/Volumes/main/default/my-volume/hello.txt", "Hello, Databricks!", TRUE)
# [1] TRUE
dbutils.fs.put("/Volumes/main/default/my-volume/hello.txt", "Hello, Databricks!", true)
// Wrote 2258987 bytes.
// res2: Boolean = true
refreshMounts command (dbutils.fs.refreshMounts)
Forces all machines in the cluster to refresh their mount cache, ensuring they receive the most recent information.
To display help for this command, run dbutils.fs.help("refreshMounts")
.
dbutils.fs.refreshMounts()
dbutils.fs.refreshMounts()
For additional code examples, see Connect to Amazon S3.
rm command (dbutils.fs.rm)
Removes a file or directory and, optionally, all of its contents. If a file is specified, the recurse
parameter is ignored. If a directory is specified, an error occurs when recurse
is disabled and the directory is not empty.
To display help for this command, run dbutils.fs.help("rm")
.
This example removes the entire directory /Volumes/main/default/my-volume/my-data/
including its contents.
dbutils.fs.rm("/Volumes/main/default/my-volume/my-data/", True)
# Out[8]: True
dbutils.fs.rm("/Volumes/main/default/my-volume/my-data/", TRUE)
# [1] TRUE
dbutils.fs.rm("/Volumes/main/default/my-volume/my-data/", true)
// res6: Boolean = true
unmount command (dbutils.fs.unmount)
Deletes a DBFS mount point.
Warning
To avoid errors, never modify a mount point while other jobs are reading or writing to it. After modifying a mount, always run dbutils.fs.refreshMounts()
on all other running clusters to propagate any mount updates. See refreshMounts command (dbutils.fs.refreshMounts).
To display help for this command, run dbutils.fs.help("unmount")
.
dbutils.fs.unmount("/mnt/<mount-name>")
For additional code examples, see Connect to Amazon S3.
updateMount command (dbutils.fs.updateMount)
Similar to the dbutils.fs.mount
command, but updates an existing mount point instead of creating a new one. Returns an error if the mount point is not present.
To display help for this command, run dbutils.fs.help("updateMount")
.
Warning
To avoid errors, never modify a mount point while other jobs are reading or writing to it. After modifying a mount, always run dbutils.fs.refreshMounts()
on all other running clusters to propagate any mount updates. See refreshMounts command (dbutils.fs.refreshMounts).
This command is available in Databricks Runtime 10.4 LTS and above.
aws_bucket_name = "my-bucket"
mount_name = "s3-my-bucket"
dbutils.fs.updateMount("s3a://%s" % aws_bucket_name, "/mnt/%s" % mount_name)
val AwsBucketName = "my-bucket"
val MountName = "s3-my-bucket"
dbutils.fs.updateMount(s"s3a://$AwsBucketName", s"/mnt/$MountName")
Jobs utility (dbutils.jobs)
Subutilities: taskValues
Note
This utility is available only for Python.
The jobs utility allows you to leverage jobs features. To display help for this utility, run dbutils.jobs.help()
.
Provides utilities for leveraging jobs features.
taskValues: TaskValuesUtils -> Provides utilities for leveraging job task values
taskValues subutility (dbutils.jobs.taskValues)
Note
This subutility is available only for Python.
Provides commands for leveraging job task values.
Use this sub-utility to set and get arbitrary values during a job run. These values are called task values. Any task can get values set by upstream tasks and set values for downstream tasks to use.
Each task value has a unique key within the same task. This unique key is known as the task value’s key. A task value is accessed with the task name and the task value’s key. You can use this to pass information downstream from task to task within the same job run. For example, you can pass identifiers or metrics, such as information about the evaluation of a machine learning model, between different tasks within a job run.
To display help for this subutility, run dbutils.jobs.taskValues.help()
.
get command (dbutils.jobs.taskValues.get)
Note
This command is available only for Python.
On Databricks Runtime 10.4 and earlier, if get
cannot find the task, a Py4JJavaError is raised instead of a ValueError
.
Gets the contents of the specified task value for the specified task in the current job run.
To display help for this command, run dbutils.jobs.taskValues.help("get")
.
For example:
dbutils.jobs.taskValues.get(taskKey = "my-task", \
key = "my-key", \
default = 7, \
debugValue = 42)
In the preceding example:
taskKey
is the name of the task that sets the task value. If the command cannot find this task, aValueError
is raised.key
is the name of the task value’s key that you set with the set command (dbutils.jobs.taskValues.set). If the command cannot find this task value’s key, aValueError
is raised (unlessdefault
is specified).default
is an optional value that is returned ifkey
cannot be found.default
cannot beNone
.debugValue
is an optional value that is returned if you try to get the task value from within a notebook that is running outside of a job. This can be useful during debugging when you want to run your notebook manually and return some value instead of raising aTypeError
by default.debugValue
cannot beNone
.
If you try to get a task value from within a notebook that is running outside of a job, this command raises a TypeError
by default. However, if the debugValue
argument is specified in the command, the value of debugValue
is returned instead of raising a TypeError
.
set command (dbutils.jobs.taskValues.set)
Note
This command is available only for Python.
Sets or updates a task value. You can set up to 250 task values for a job run.
To display help for this command, run dbutils.jobs.taskValues.help("set")
.
Some examples include:
dbutils.jobs.taskValues.set(key = "my-key", \
value = 5)
dbutils.jobs.taskValues.set(key = "my-other-key", \
value = "my other value")
In the preceding examples:
key
is the task value’s key. This key must be unique to the task. That is, if two different tasks each set a task value with keyK
, these are two different task values that have the same keyK
.value
is the value for this task value’s key. This command must be able to represent the value internally in JSON format. The size of the JSON representation of the value cannot exceed 48 KiB.
If you try to set a task value from within a notebook that is running outside of a job, this command does nothing.
Library utility (dbutils.library)
Most methods in the dbutils.library
submodule are deprecated. See Library utility (dbutils.library) (legacy).
You might need to programmatically restart the Python process on Databricks to ensure that locally installed or upgraded libraries function correctly in the Python kernel for your current SparkSession. To do this, run the dbutils.library.restartPython
command. See Restart the Python process on Databricks.
Notebook utility (dbutils.notebook)
The notebook utility allows you to chain together notebooks and act on their results. See Run a Databricks notebook from another notebook.
To list the available commands, run dbutils.notebook.help()
.
exit(value: String): void -> This method lets you exit a notebook with a value
run(path: String, timeoutSeconds: int, arguments: Map): String -> This method runs a notebook and returns its exit value.
exit command (dbutils.notebook.exit)
Exits a notebook with a value.
To display help for this command, run dbutils.notebook.help("exit")
.
This example exits the notebook with the value Exiting from My Other Notebook
.
dbutils.notebook.exit("Exiting from My Other Notebook")
# Notebook exited: Exiting from My Other Notebook
dbutils.notebook.exit("Exiting from My Other Notebook")
# Notebook exited: Exiting from My Other Notebook
dbutils.notebook.exit("Exiting from My Other Notebook")
// Notebook exited: Exiting from My Other Notebook
Note
If the run has a query with structured streaming running in the background, calling dbutils.notebook.exit()
does not terminate the run. The run will continue to execute for as long as the query is executing in the background. You can stop the query running in the background by clicking Cancel in the cell of the query or by running query.stop()
. When the query stops, you can terminate the run with dbutils.notebook.exit()
.
run command (dbutils.notebook.run)
Runs a notebook and returns its exit value. The notebook will run in the current cluster by default.
Note
The maximum length of the string value returned from the run
command is 5 MB. See Get the output for a single run (GET /jobs/runs/get-output
).
To display help for this command, run dbutils.notebook.help("run")
.
This example runs a notebook named My Other Notebook
in the same location as the calling notebook. The called notebook ends with the line of code dbutils.notebook.exit("Exiting from My Other Notebook")
. If the called notebook does not finish running within 60 seconds, an exception is thrown.
dbutils.notebook.run("My Other Notebook", 60)
# Out[14]: 'Exiting from My Other Notebook'
dbutils.notebook.run("My Other Notebook", 60)
// res2: String = Exiting from My Other Notebook
Secrets utility (dbutils.secrets)
Commands: get, getBytes, list, listScopes
The secrets utility allows you to store and access sensitive credential information without making them visible in notebooks. See Secret management and Use the secrets in a notebook. To list the available commands, run dbutils.secrets.help()
.
get(scope: String, key: String): String -> Gets the string representation of a secret value with scope and key
getBytes(scope: String, key: String): byte[] -> Gets the bytes representation of a secret value with scope and key
list(scope: String): Seq -> Lists secret metadata for secrets within a scope
listScopes: Seq -> Lists secret scopes
get command (dbutils.secrets.get)
Gets the string representation of a secret value for the specified secrets scope and key.
Warning
Administrators, secret creators, and users granted permission can read Databricks secrets. While Databricks makes an effort to redact secret values that might be displayed in notebooks, it is not possible to prevent such users from reading secrets. For more information, see Secret redaction.
To display help for this command, run dbutils.secrets.help("get")
.
This example gets the string representation of the secret value for the scope named my-scope
and the key named my-key
.
dbutils.secrets.get(scope="my-scope", key="my-key")
# Out[14]: '[REDACTED]'
dbutils.secrets.get(scope="my-scope", key="my-key")
# [1] "[REDACTED]"
dbutils.secrets.get(scope="my-scope", key="my-key")
// res0: String = [REDACTED]
getBytes command (dbutils.secrets.getBytes)
Gets the bytes representation of a secret value for the specified scope and key.
To display help for this command, run dbutils.secrets.help("getBytes")
.
This example gets the byte representation of the secret value (in this example, a1!b2@c3#
) for the scope named my-scope
and the key named my-key
.
dbutils.secrets.getBytes(scope="my-scope", key="my-key")
# Out[1]: b'a1!b2@c3#'
dbutils.secrets.getBytes(scope="my-scope", key="my-key")
# [1] 61 31 21 62 32 40 63 33 23
dbutils.secrets.getBytes(scope="my-scope", key="my-key")
// res1: Array[Byte] = Array(97, 49, 33, 98, 50, 64, 99, 51, 35)
list command (dbutils.secrets.list)
Lists the metadata for secrets within the specified scope.
To display help for this command, run dbutils.secrets.help("list")
.
This example lists the metadata for secrets within the scope named my-scope
.
dbutils.secrets.list("my-scope")
# Out[10]: [SecretMetadata(key='my-key')]
dbutils.secrets.list("my-scope")
# [[1]]
# [[1]]$key
# [1] "my-key"
dbutils.secrets.list("my-scope")
// res2: Seq[com.databricks.dbutils_v1.SecretMetadata] = ArrayBuffer(SecretMetadata(my-key))
listScopes command (dbutils.secrets.listScopes)
Lists the available scopes.
To display help for this command, run dbutils.secrets.help("listScopes")
.
This example lists the available scopes.
dbutils.secrets.listScopes()
# Out[14]: [SecretScope(name='my-scope')]
dbutils.secrets.listScopes()
# [[1]]
# [[1]]$name
# [1] "my-scope"
dbutils.secrets.listScopes()
// res3: Seq[com.databricks.dbutils_v1.SecretScope] = ArrayBuffer(SecretScope(my-scope))
Widgets utility (dbutils.widgets)
Commands: combobox, dropdown, get, getArgument, multiselect, remove, removeAll, text
The widgets utility allows you to parameterize notebooks. See Databricks widgets.
To list the available commands, run dbutils.widgets.help()
.
combobox(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a combobox input widget with a given name, default value, and choices
dropdown(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a dropdown input widget a with given name, default value, and choices
get(name: String): String -> Retrieves current value of an input widget
getAll: map -> Retrieves a map of all widget names and their values
getArgument(name: String, optional: String): String -> (DEPRECATED) Equivalent to get
multiselect(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a multiselect input widget with a given name, default value, and choices
remove(name: String): void -> Removes an input widget from the notebook
removeAll: void -> Removes all widgets in the notebook
text(name: String, defaultValue: String, label: String): void -> Creates a text input widget with a given name and default value
combobox command (dbutils.widgets.combobox)
Creates and displays a combobox widget with the specified programmatic name, default value, choices, and optional label.
To display help for this command, run dbutils.widgets.help("combobox")
.
This example creates and displays a combobox widget with the programmatic name fruits_combobox
. It offers the choices apple
, banana
, coconut
, and dragon fruit
and is set to the initial value of banana
. This combobox widget has an accompanying label Fruits
. This example ends by printing the initial value of the combobox widget, banana
.
dbutils.widgets.combobox(
name='fruits_combobox',
defaultValue='banana',
choices=['apple', 'banana', 'coconut', 'dragon fruit'],
label='Fruits'
)
print(dbutils.widgets.get("fruits_combobox"))
# banana
dbutils.widgets.combobox(
name='fruits_combobox',
defaultValue='banana',
choices=list('apple', 'banana', 'coconut', 'dragon fruit'),
label='Fruits'
)
print(dbutils.widgets.get("fruits_combobox"))
# [1] "banana"
dbutils.widgets.combobox(
"fruits_combobox",
"banana",
Array("apple", "banana", "coconut", "dragon fruit"),
"Fruits"
)
print(dbutils.widgets.get("fruits_combobox"))
// banana
CREATE WIDGET COMBOBOX fruits_combobox DEFAULT "banana" CHOICES SELECT * FROM (VALUES ("apple"), ("banana"), ("coconut"), ("dragon fruit"))
SELECT :fruits_combobox
-- banana
dropdown command (dbutils.widgets.dropdown)
Creates and displays a dropdown widget with the specified programmatic name, default value, choices, and optional label.
To display help for this command, run dbutils.widgets.help("dropdown")
.
This example creates and displays a dropdown widget with the programmatic name toys_dropdown
. It offers the choices alphabet blocks
, basketball
, cape
, and doll
and is set to the initial value of basketball
. This dropdown widget has an accompanying label Toys
. This example ends by printing the initial value of the dropdown widget, basketball
.
dbutils.widgets.dropdown(
name='toys_dropdown',
defaultValue='basketball',
choices=['alphabet blocks', 'basketball', 'cape', 'doll'],
label='Toys'
)
print(dbutils.widgets.get("toys_dropdown"))
# basketball
dbutils.widgets.dropdown(
name='toys_dropdown',
defaultValue='basketball',
choices=list('alphabet blocks', 'basketball', 'cape', 'doll'),
label='Toys'
)
print(dbutils.widgets.get("toys_dropdown"))
# [1] "basketball"
dbutils.widgets.dropdown(
"toys_dropdown",
"basketball",
Array("alphabet blocks", "basketball", "cape", "doll"),
"Toys"
)
print(dbutils.widgets.get("toys_dropdown"))
// basketball
CREATE WIDGET DROPDOWN toys_dropdown DEFAULT "basketball" CHOICES SELECT * FROM (VALUES ("alphabet blocks"), ("basketball"), ("cape"), ("doll"))
SELECT :toys_dropdown
-- basketball
get command (dbutils.widgets.get)
Gets the current value of the widget with the specified programmatic name. This programmatic name can be either:
The name of a custom widget in the notebook, for example,
fruits_combobox
ortoys_dropdown
.The name of a custom parameter passed to the notebook as part of a notebook task, for example
name
orage
. For more information, see the coverage of parameters for notebook tasks in the jobs UI or thenotebook_params
field in the Trigger a new job run (POST /jobs/run-now
) operation in the Jobs API.
To display help for this command, run dbutils.widgets.help("get")
.
This example gets the value of the widget that has the programmatic name fruits_combobox
.
dbutils.widgets.get('fruits_combobox')
# banana
dbutils.widgets.get('fruits_combobox')
# [1] "banana"
dbutils.widgets.get("fruits_combobox")
// res6: String = banana
SELECT :fruits_combobox
-- banana
This example gets the value of the notebook task parameter that has the programmatic name age
. This parameter was set to 35
when the related notebook task was run.
dbutils.widgets.get('age')
# 35
dbutils.widgets.get('age')
# [1] "35"
dbutils.widgets.get("age")
// res6: String = 35
SELECT :age
-- 35
getAll command (dbutils.widgets.getAll)
Gets a mapping of all current widget names and values. This can be especially useful to quickly pass widget values to a spark.sql()
query.
This command is available in Databricks Runtime 13.3 LTS and above. It is only available for Python and Scala.
To display help for this command, run dbutils.widgets.help("getAll")
.
This example gets the map of widget values and passes it as parameter arguments in a Spark SQL query.
df = spark.sql("SELECT * FROM table where col1 = :param", dbutils.widgets.getAll())
df.show()
# Query output
val df = spark.sql("SELECT * FROM table where col1 = :param", dbutils.widgets.getAll())
df.show()
// res6: Query output
getArgument command (dbutils.widgets.getArgument)
Gets the current value of the widget with the specified programmatic name. If the widget does not exist, an optional message can be returned.
Note
This command is deprecated. Use dbutils.widgets.get instead.
To display help for this command, run dbutils.widgets.help("getArgument")
.
This example gets the value of the widget that has the programmatic name fruits_combobox
. If this widget does not exist, the message Error: Cannot find fruits combobox
is returned.
dbutils.widgets.getArgument('fruits_combobox', 'Error: Cannot find fruits combobox')
# Deprecation warning: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.
# Out[3]: 'banana'
dbutils.widgets.getArgument('fruits_combobox', 'Error: Cannot find fruits combobox')
# Deprecation warning: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.
# [1] "banana"
dbutils.widgets.getArgument("fruits_combobox", "Error: Cannot find fruits combobox")
// command-1234567890123456:1: warning: method getArgument in trait WidgetsUtils is deprecated: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.
// dbutils.widgets.getArgument("fruits_combobox", "Error: Cannot find fruits combobox")
// ^
// res7: String = banana
multiselect command (dbutils.widgets.multiselect)
Creates and displays a multiselect widget with the specified programmatic name, default value, choices, and optional label.
To display help for this command, run dbutils.widgets.help("multiselect")
.
This example creates and displays a multiselect widget with the programmatic name days_multiselect
. It offers the choices Monday
through Sunday
and is set to the initial value of Tuesday
. This multiselect widget has an accompanying label Days of the Week
. This example ends by printing the initial value of the multiselect widget, Tuesday
.
dbutils.widgets.multiselect(
name='days_multiselect',
defaultValue='Tuesday',
choices=['Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday'],
label='Days of the Week'
)
print(dbutils.widgets.get("days_multiselect"))
# Tuesday
dbutils.widgets.multiselect(
name='days_multiselect',
defaultValue='Tuesday',
choices=list('Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday'),
label='Days of the Week'
)
print(dbutils.widgets.get("days_multiselect"))
# [1] "Tuesday"
dbutils.widgets.multiselect(
"days_multiselect",
"Tuesday",
Array("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"),
"Days of the Week"
)
print(dbutils.widgets.get("days_multiselect"))
// Tuesday
CREATE WIDGET MULTISELECT days_multiselect DEFAULT "Tuesday" CHOICES SELECT * FROM (VALUES ("Monday"), ("Tuesday"), ("Wednesday"), ("Thursday"), ("Friday"), ("Saturday"), ("Sunday"))
SELECT :days_multiselect
-- Tuesday
remove command (dbutils.widgets.remove)
Removes the widget with the specified programmatic name.
To display help for this command, run dbutils.widgets.help("remove")
.
Important
If you add a command to remove a widget, you cannot add a subsequent command to create a widget in the same cell. You must create the widget in another cell.
This example removes the widget with the programmatic name fruits_combobox
.
dbutils.widgets.remove('fruits_combobox')
dbutils.widgets.remove('fruits_combobox')
dbutils.widgets.remove("fruits_combobox")
REMOVE WIDGET fruits_combobox
removeAll command (dbutils.widgets.removeAll)
Removes all widgets from the notebook.
To display help for this command, run dbutils.widgets.help("removeAll")
.
Important
If you add a command to remove all widgets, you cannot add a subsequent command to create any widgets in the same cell. You must create the widgets in another cell.
This example removes all widgets from the notebook.
dbutils.widgets.removeAll()
dbutils.widgets.removeAll()
dbutils.widgets.removeAll()
text command (dbutils.widgets.text)
Creates and displays a text widget with the specified programmatic name, default value, and optional label.
To display help for this command, run dbutils.widgets.help("text")
.
This example creates and displays a text widget with the programmatic name your_name_text
. It is set to the initial value of Enter your name
. This text widget has an accompanying label Your name
. This example ends by printing the initial value of the text widget, Enter your name
.
dbutils.widgets.text(
name='your_name_text',
defaultValue='Enter your name',
label='Your name'
)
print(dbutils.widgets.get("your_name_text"))
# Enter your name
dbutils.widgets.text(
name='your_name_text',
defaultValue='Enter your name',
label='Your name'
)
print(dbutils.widgets.get("your_name_text"))
# [1] "Enter your name"
dbutils.widgets.text(
"your_name_text",
"Enter your name",
"Your name"
)
print(dbutils.widgets.get("your_name_text"))
// Enter your name
CREATE WIDGET TEXT your_name_text DEFAULT "Enter your name"
SELECT :your_name_text
-- Enter your name
Databricks Utilities API library
Important
The Databricks Utilities API (dbutils-api
) library is deprecated. Although this library is still available, Databricks plans no new feature work for the dbutils-api
library.
Databricks recommends that you use one of the following libraries instead:
To accelerate application development, it can be helpful to compile, build, and test applications before you deploy them as production jobs. To enable you to compile against Databricks Utilities, Databricks provides the dbutils-api
library. You can download the dbutils-api
library from the DBUtils API webpage on the Maven Repository website or include the library by adding a dependency to your build file:
SBT
libraryDependencies += "com.databricks" % "dbutils-api_TARGET" % "VERSION"
Maven
<dependency> <groupId>com.databricks</groupId> <artifactId>dbutils-api_TARGET</artifactId> <version>VERSION</version> </dependency>
Gradle
compile 'com.databricks:dbutils-api_TARGET:VERSION'
Replace TARGET
with the desired target (for example, 2.12
) and VERSION
with the desired version (for example, 0.0.5
). For a list of available targets and versions, see the DBUtils API webpage on the Maven Repository website.
Once you build your application against this library, you can deploy the application.
Important
The dbutils-api
library only allows you to locally compile an application that uses dbutils
, not to run it. To run the application, you must deploy it in Databricks.
Limitations
Calling dbutils
inside of executors can produce unexpected results or potentially result in errors.
If you need to run file system operations on executors using dbutils
, refer to the parallel listing and delete methods using Spark in How to list and delete files faster in Databricks.
For information about executors, see Cluster Mode Overview on the Apache Spark website.