Notebook-scoped Python libraries
Notebook-scoped libraries let you create, modify, save, reuse, and share custom Python environments that are specific to a notebook. When you install a notebook-scoped library, only the current notebook and any jobs associated with that notebook have access to that library. Other notebooks attached to the same cluster are not affected.
Notebook-scoped libraries do not persist across sessions. You must reinstall notebook-scoped libraries at the beginning of each session, or whenever the notebook is detached from a cluster.
Databricks recommends using the %pip magic command to install notebook-scoped Python libraries.
You can use %pip in notebooks scheduled as jobs. If you need to manage the Python environment in a Scala, SQL, or R notebook, use the %python magic command in conjunction with %pip.
You might experience more traffic to the driver node when working with notebook-scoped library installs. See How large should the driver node be when working with notebook-scoped libraries?.
To install libraries for all notebooks attached to a cluster, use cluster libraries. See Compute-scoped libraries.
For a comprehensive overview of options available for installing Python libraries in Databricks, see Python environment management.
On Databricks Runtime 10.4 LTS and below, you can use the (legacy) Databricks library utility. The library utility is supported only on Databricks Runtime, not Databricks Runtime ML. See Library utility (dbutils.library) (legacy).
Manage libraries with %pip commands
The %pip command is equivalent to the pip command and supports the same API. The following sections show examples of how you can use %pip commands to manage your environment. For more information on installing Python packages with pip, see the pip install documentation and related pages.
- Starting with Databricks Runtime 13.0 %pipcommands do not automatically restart the Python process. If you install a new package or update an existing package, you may need to usedbutils.library.restartPython()to see the new packages. See Restart the Python process on Databricks.
- On Databricks Runtime 12.2 LTS and below, Databricks recommends placing all %pipcommands at the beginning of the notebook. The notebook state is reset after any%pipcommand that modifies the environment. If you create Python methods or variables in a notebook, and then use%pipcommands in a later cell, the methods or variables are lost.
- Upgrading, modifying, or uninstalling core Python packages (such as IPython) with %pipmay cause some features to stop working as expected. If you experience such problems, reset the environment by restarting the cluster or starting a new session.
Install a library with %pip
%pip install matplotlib
Install a Python wheel package with %pip
%pip install /path/to/my_package.whl
Uninstall a library with %pip
You cannot uninstall a library that is included in Databricks Runtime release notes versions and compatibility or a library that has been installed as a cluster library. If you have installed a different library version than the one included in Databricks Runtime or the one installed on the cluster, you can use %pip uninstall to revert the library to the default version in Databricks Runtime or the version installed on the cluster, but you cannot use a %pip command to uninstall the version of a library included in Databricks Runtime or installed on the cluster.
%pip uninstall -y matplotlib
The -y option is required.
Install a library from a version control system with %pip
%pip install git+https://github.com/databricks/databricks-cli
You can add parameters to the URL to specify things like the version or git subdirectory. See the VCS support for more information and for examples using other version control systems.
Install a private package with credentials managed by Databricks secrets with %pip
pip supports installing packages from private sources with basic authentication, including private version control systems and private package repositories, such as Nexus and Artifactory. Secret management is available via the Databricks Secrets API, which allows you to store authentication tokens and passwords. Use the DBUtils API to access secrets from your notebook. Note that you can use $variables in magic commands.
To install a package from a private repository, specify the repository URL with the --index-url option to %pip install or add it to the pip config file at ~/.pip/pip.conf.
token = dbutils.secrets.get(scope="scope", key="key")
%pip install --index-url https://<user>:$token@<your-package-repository>.com/<path/to/repo> <package>==<version> --extra-index-url https://pypi.org/simple/
Similarly, you can use secret management with magic commands to install private packages from version control systems.
token = dbutils.secrets.get(scope="scope", key="key")
%pip install git+https://<user>:$token@<gitprovider>.com/<path/to/repo>
Install a package from DBFS with %pip
Any workspace user can modify files stored in DBFS. Databricks recommends storing files in workspaces or on Unity Catalog volumes.
You can use %pip to install a private package that has been saved on DBFS.
When you upload a file to DBFS, it automatically renames the file, replacing spaces, periods, and hyphens with underscores. For Python wheel files, pip requires that the name of the file use periods in the version (for example, 0.1.0) and hyphens instead of spaces or underscores, so these filenames are not changed.
%pip install /dbfs/mypackage-0.0.1-py3-none-any.whl
Install a package from a volume with %pip
This feature is in Public Preview.
With Databricks Runtime 13.3 LTS and above, you can use %pip to install a private package that has been saved to a volume.
When you upload a file to a volume, it automatically renames the file, replacing spaces, periods, and hyphens with underscores. For Python wheel files, pip requires that the name of the file use periods in the version (for example, 0.1.0) and hyphens instead of spaces or underscores, so these filenames are not changed.
%pip install /Volumes/<catalog>/<schema>/<path-to-library>/mypackage-0.0.1-py3-none-any.whl
Install a package stored as a workspace file with %pip
With Databricks Runtime 11.3 LTS and above, you can use %pip to install a private package that has been saved as a workspace file.
%pip install /Workspace/<path-to-whl-file>/mypackage-0.0.1-py3-none-any.whl
Save libraries in a requirements file
%pip freeze > /Workspace/shared/prod_requirements.txt
Any subdirectories in the file path must already exist. If you run %pip freeze > /Workspace/<new-directory>/requirements.txt, the command fails if the directory /Workspace/<new-directory> does not already exist.
Use a requirements file to install libraries
A requirements file contains a list of packages to be installed using pip. An example of using a requirements file is:
%pip install -r /Workspace/shared/prod_requirements.txt
See Requirements File Format for more information on requirements.txt files.
How large should the driver node be when working with notebook-scoped libraries?
Using notebook-scoped libraries might result in more traffic to the driver node as it works to keep the environment consistent across executor nodes.
When you use a cluster with 10 or more nodes, Databricks recommends these specs as a minimum requirement for the driver node:
- For a 100 node CPU cluster, use r6id.8xlarge for x86 and r7gd.8xlarge for ARM.
- For a 10 node GPU cluster, use p2.xlarge.
For larger clusters, use a larger driver node.
Can I use %sh pip, !pip, or pip? What is the difference?
%sh and ! execute a shell command in a notebook; the former is a Databricks auxiliary magic command while the latter is a feature of IPython. pip is a shorthand for %pip when automagic is enabled, which is the default in Databricks Python notebooks.
On Databricks Runtime 11.3 LTS and above, %pip, %sh pip, and !pip all install a library as a notebook-scoped Python library. On Databricks Runtime 10.4 LTS and below, Databricks recommends using only %pip or pip to install notebook-scoped libraries. The behavior of %sh pip and !pip is not consistent in Databricks Runtime 10.4 LTS and below.
Known issues
- On Databricks Runtime 9.1 LTS, notebook-scoped libraries are incompatible with batch streaming jobs. Databricks recommends using cluster libraries or the IPython kernel instead.