Notebook-scoped Python libraries

Preview

This feature is in Public Preview.

Notebook-scoped libraries let you create, save, reuse, and share custom Python environments that are specific to a notebook. When you install a notebook-scoped library, only the current notebook and any jobs associated with that notebook have access to that library. Other notebooks attached to the same cluster are not affected.

Notebook-scoped libraries do not persist across sessions. You must reinstall notebook-scoped libraries at the beginning of each session, or whenever the notebook is detached from a cluster.

There are two methods for installing notebook-scoped libraries:

  • Using the %pip or %conda magic command in the notebook. The %pip command is supported in Databricks Runtime 7.1 and above. Both %pip and %conda are supported in Databricks Runtime 6.4 ML and above. This article describes how to use these magic commands.
  • Using Databricks library utilities. This is supported only on Databricks Runtime, not Databricks Runtime ML. See Library utilities.

To install libraries for all notebooks attached to the cluster, use workspace and cluster-installed libraries.

Requirements

This feature is enabled by default in Databricks Runtime 7.1 and above and in Databricks Runtime 7.1 ML and above.

It is also available via a configuration setting in Databricks Runtime 6.4 ML to 7.0 ML. Set the Spark configuration spark.databricks.conda.condaMagic.enabled to true for your cluster.

This feature is not compatible with table access control or credential passthrough. You cannot use notebook-scoped libraries on a Databricks Runtime ML cluster with those features enabled. An alternative is to use Library utilities on a Databricks Runtime cluster.

Driver node

Using notebook-scoped libraries might result in more traffic to the driver node as it works to keep the environment consistent across executor nodes. When you use a cluster with 10 or more nodes, Databricks recommends these specs for the driver node:

  • For a 100 node CPU cluster, use i3.8xlarge.
  • For a 10 node GPU cluster, use p2.xlarge.

For larger clusters, use a larger driver node.

Using notebook-scoped libraries

Databricks Runtime uses %pip magic commands to create and manage notebook-scoped libraries. On Databricks Runtime ML, you can also use %conda magic commands. Databricks recommends using pip to install libraries, unless the library you want to install recommends using conda. For more information, see Understanding conda and pip.

Important

  • You should place all %pip and %conda commands at the beginning of the notebook. The notebook state is reset after any %pip or %conda command that modifies the environment. If you create Python methods or variables in a notebook, and then use %pip or %conda commands in a later cell, the methods or variables will be lost.
  • If you must use both %pip and %conda commands in a notebook, see Interactions between pip and conda commands.

Manage libraries with %pip commands

The following sections contain some examples of how you can use %pip commands to manage the environment.

Use a requirements file to install libraries

A requirements file contains a list of packages to be installed using pip. The name of the file must end with requirements.txt. An example of using a requirements file is:

%pip install -r /dbfs/requirements.txt

See Requirements File Format for more information on requirements.txt files.

Use pip to install a library

%pip install matplotlib

Use pip to install a wheel package

%pip install /dbfs/my_package.whl

Use pip to uninstall a library

Note

In Databricks Runtime, you cannot uninstall a library that is included in Databricks Runtime or a library that has been installed as a cluster library. If you have installed a different version than the one included in Databricks Runtime or the one installed on the cluster, you can use %pip uninstall to revert the library to the default version in Databricks Runtime or the version installed on the cluster, but you cannot use %pip to uninstall the version of a library included in Databricks Runtime or installed on the cluster.

%pip uninstall -y matplotlib

Note

The -y option is required.

Save libraries in a requirements file

%pip freeze > /dbfs/requirements.txt

Manage libraries with %conda commands

Note

%conda magic commands are not available on Databricks Runtime. They are available on Databricks Runtime for Machine Learning.

The following sections contain some examples of how you can use %conda commands to manage the environment.

Use conda to install a library

%conda install matplotlib

Use conda to uninstall a library

%conda uninstall matplotlib

Copy, reuse, and share an environment

When you detach a notebook from a cluster, the environment is not saved. To save an environment so you can reuse it later or share it with someone else, follow these steps.

Note

Databricks recommends that environments be shared only between clusters running the same version of Databricks Runtime ML.

  1. Save the environment as a conda YAML specification.

    %conda env export -f /dbfs/myenv.yml
    
  2. Import the file to another notebook using conda env update.

    %conda env update -f /dbfs/myenv.yml
    

List the Python environment of a notebook

To show the Python environment associated with a notebook, use %conda list:

%conda list

Interactions between pip and conda commands

To avoid conflicts, follow these guidelines when using pip or conda to install Python packages and libraries.

  • Libraries installed via the API or via the cluster UI are installed using pip. If any libraries have been installed from the API or the cluster UI, you should use only %pip commands when installing notebook-scoped libraries.
  • If you will use notebook-scoped libraries on a cluster, init scripts run on that cluster can use either conda or pip commands to install libraries. However, if the init script includes pip commands, use only %pip commands in notebooks (not %conda).
  • It’s best to use either pip commands exclusively or conda commands exclusively. If you must install some packages via conda and some via pip, run the conda commands first, and then run the pip commands. For more information, see Using Pip in a Conda Environment.

Frequently asked questions (FAQ)

How do libraries installed from the clusters UI/API interact with notebook-scoped libraries?

Libraries installed from the clusters UI or API are available to all notebooks on the cluster. These libraries are installed using pip; therefore, if libraries are installed via the cluster UI, use only %pip commands in notebooks.

How do libraries installed via an init script interact with notebook-scoped libraries?

Libraries installed via an init script are available to all notebooks on the cluster.

If you use notebook-scoped libraries on a cluster running Databricks Runtime ML, init scripts run on the cluster can use either conda or pip commands to install libraries. However, if the init script includes pip commands, then use only %pip commands in notebooks.

For example, this notebook code snippet generates a script that installs fast.ai packages on all the cluster nodes.

dbutils.fs.put("dbfs:/home/myScripts/fast.ai", "conda install -c pytorch -c fastai fastai -y", True)

Can I use %pip and %conda commands in job notebooks?

Yes.

Can I use %sh pip or %sh conda?

We do not recommend using %sh pip because it is not compatible with %pip usage.

Can I update R packages using %conda commands?

No.

Limitations

  • The following conda commands are not supported:
    • activate
    • create
    • init
    • run
    • env create
    • env remove

Known issues

For Databricks Runtime 7.0 ML and below, if a registered UDF depends on Python packages installed via %pip/%conda, it won’t work in %sql cells. Use spark.sql in a Python command shell instead.