Library utility (dbutils.library) (legacy)
Note
dbutils.library.install
and dbutils.library.installPyPI
APIs are removed in Databricks Runtime 11.0 and above. Most library utility commands are deprecated. Most library utilities are not available on Databricks Runtime ML. For information on dbutils.library.restartPython
, see Restart the Python process on Databricks.
This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported.
Databricks strongly recommends using %pip
magic commands to install notebook-scoped libraries. See Notebook-scoped Python libraries.
For full documentation for Databricks utilities functionality, see Databricks Utilities (dbutils) reference.
Commands: install, installPyPI, list, restartPython), updateCondaEnv
The library utility allows you to install Python libraries and create an environment scoped to a notebook session. The libraries are available both on the driver and on the executors, so you can reference them in user defined functions. This enables:
Library dependencies of a notebook to be organized within the notebook itself.
Notebook users with different library dependencies to share a cluster without interference.
Detaching a notebook destroys this environment. However, you can recreate it by re-running the library install
API commands in the notebook. See the restartPython
API for how you can reset your notebook state without losing your environment.
Library utilities are enabled by default. Therefore, by default the Python environment for each notebook is isolated by using a separate Python executable that is created when the notebook is attached to and inherits the default Python environment on the cluster. Libraries installed through an init script into the Databricks Python environment are still available. You can disable this feature by setting spark.databricks.libraryIsolation.enabled
to false
.
This API is compatible with the existing cluster-wide library installation through the UI and Libraries API. Libraries installed through this API have higher priority than cluster-wide libraries.
To list the available commands, run dbutils.library.help()
.
install(path: String): boolean -> Install the library within the current notebook session
installPyPI(pypiPackage: String, version: String = "", repo: String = "", extras: String = ""): boolean -> Install the PyPI library within the current notebook session
list: List -> List the isolated libraries added for the current notebook session via dbutils
restartPython: void -> Restart python process for the current notebook session
updateCondaEnv(envYmlContent: String): boolean -> Update the current notebook's Conda environment based on the specification (content of environment
install command (dbutils.library.install)
Given a path to a library, installs that library within the current notebook session. Libraries installed by calling this command are available only to the current notebook.
To display help for this command, run dbutils.library.help("install")
.
This example installs a .egg
or .whl
library within a notebook.
Important
dbutils.library.install
is removed in Databricks Runtime 11.0 and above.
Databricks recommends that you put all your library install commands in the first cell of your notebook and call restartPython
at the end of that cell. The Python notebook state is reset after running restartPython
; the notebook loses all state including but not limited to local variables, imported libraries, and other ephemeral states. Therefore, we recommend that you install libraries and reset the notebook state in the first notebook cell.
The accepted library sources are dbfs
and s3
.
dbutils.library.install("dbfs:/path/to/your/library.egg")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.
dbutils.library.install("dbfs:/path/to/your/library.whl")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.
Note
You can directly install custom wheel files using %pip
. In the following example we are assuming you have uploaded your library wheel file to DBFS:
%pip install /dbfs/path/to/your/library.whl
Egg files are not supported by pip, and wheel files are considered the standard for build and binary packaging for Python. However, if you want to use an egg file in a way that’s compatible with %pip
, you can use the following workaround:
# This step is only needed if no %pip commands have been run yet.
# It will trigger setting up the isolated notebook environment
%pip install <any-lib> # This doesn't need to be a real library; for example "%pip install any-lib" would work
import sys
# Assuming the preceding step was completed, the following command
# adds the egg file to the current notebook environment
sys.path.append("/local/path/to/library.egg")
installPyPI command (dbutils.library.installPyPI)
Given a Python Package Index (PyPI) package, install that package within the current notebook session. Libraries installed by calling this command are isolated among notebooks.
To display help for this command, run dbutils.library.help("installPyPI")
.
This example installs a PyPI package in a notebook. version
, repo
, and extras
are optional. Use the extras
argument to specify the Extras feature (extra requirements).
dbutils.library.installPyPI("pypipackage", version="version", repo="repo", extras="extras")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.
Important
dbutils.library.installPyPI
is removed in Databricks Runtime 11.0 and above.
The version
and extras
keys cannot be part of the PyPI package string. For example: dbutils.library.installPyPI("azureml-sdk[databricks]==1.19.0")
is not valid. Use the version
and extras
arguments to specify the version and extras information as follows:
dbutils.library.installPyPI("azureml-sdk", version="1.19.0", extras="databricks")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.
Note
When replacing dbutils.library.installPyPI
commands with %pip
commands, the Python interpreter is automatically restarted. You can run the install command as follows:
%pip install azureml-sdk[databricks]==1.19.0
This example specifies library requirements in one notebook and installs them by using %run
in the other. To do this, first define the libraries to install in a notebook. This example uses a notebook named InstallDependencies
.
dbutils.library.installPyPI("torch")
dbutils.library.installPyPI("scikit-learn", version="1.19.1")
dbutils.library.installPyPI("azureml-sdk", extras="databricks")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.
Then install them in the notebook that needs those dependencies.
%run /path/to/InstallDependencies # Install the dependencies in the first cell.
import torch
from sklearn.linear_model import LinearRegression
import azureml
...
This example resets the Python notebook state while maintaining the environment. This technique is available only in Python notebooks. For example, you can use this technique to reload libraries Databricks preinstalled with a different version:
dbutils.library.installPyPI("numpy", version="1.15.4")
dbutils.library.restartPython()
# Make sure you start using the library in another cell.
import numpy
You can also use this technique to install libraries such as tensorflow that need to be loaded on process start up:
dbutils.library.installPyPI("tensorflow")
dbutils.library.restartPython()
# Use the library in another cell.
import tensorflow
list command (dbutils.library.list)
Lists the isolated libraries added for the current notebook session through the library utility. This does not include libraries that are attached to the cluster.
To display help for this command, run dbutils.library.help("list")
.
This example lists the libraries installed in a notebook.
dbutils.library.list()
Note
The equivalent of this command using %pip
is:
%pip freeze
updateCondaEnv command (dbutils.library.updateCondaEnv)
Updates the current notebook’s Conda environment based on the contents of environment.yml
. This method is supported only for Databricks Runtime on Conda.
To display help for this command, run dbutils.library.help("updateCondaEnv")
.
This example updates the current notebook’s Conda environment based on the contents of the provided specification.
dbutils.library.updateCondaEnv(
"""
channels:
- anaconda
dependencies:
- gensim=3.4
- nltk=3.4
""")