Cluster libraries

Cluster libraries can be used by all notebooks running on a cluster. You can install a cluster library directly from a public repository such as PyPI or Maven, using a previously installed workspace library, or using an init script.

Install a library on a cluster

There are two primary ways to install a library on a cluster:

  • Install a workspace library that has been already been uploaded to the workspace.
  • Install a library for use with a specific cluster only.

In addition, if your library requires custom configuration, you may not be able to install it using the methods listed above. Instead, you can install the library using an init script that runs at cluster creation time.

Note

When you install a library on a cluster, a notebook already attached to that cluster will not immediately see the new library. You must first detach and then reattach the notebook to the cluster.

Workspace library

Note

Starting with Databricks Runtime 7.2, Databricks processes all workspace libraries in the order that they were installed on the cluster. On Databricks Runtime 7.1 and below, Databricks processes Maven and CRAN libraries in the order they are installed on the cluster.

You might need to pay attention to the order of installation on the cluster if there are dependencies between libraries.

To install a library that already exists in the workspace, you can start from the cluster UI or the library UI:

Cluster

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click a cluster name.
  3. Click the Libraries tab.
  4. Click Install New.
  5. In the Library Source button list, select Workspace.
  6. Select a workspace library.
  7. Click Install.
  8. To configure the library to be installed on all clusters:
    1. Click the library.
    2. Select the Install automatically on all clusters checkbox.
    3. Click Confirm.

Library

  1. Go to the folder containing the library.

  2. Click the library name.

  3. Do one of the following:

    • To configure the library to be installed on all clusters, select the Install automatically on all clusters checkbox and click Confirm.

      Important

      This option does not install the library on clusters running Databricks Runtime 7.0 and above.

    • Select the checkbox next to the cluster that you want to install the library on and click Install.

The library is installed on the cluster.

Cluster-installed library

You can install a library on a specific cluster without making it available as a workspace library.

To install a library on a cluster:

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click a cluster name.
  3. Click the Libraries tab.
  4. Click Install New.
  5. Follow one of the methods for creating a workspace library. After you click Create, the library is installed on the cluster.

Init script

If your library requires custom configuration, you may not be able to install it using the workspace or cluster library interface. Instead, you can install the library using an init script.

Here is an example of an init script that uses the Conda package manager to install Python libraries on a Databricks Runtime for Machine Learning cluster at cluster initialization. (Conda is available only on Databricks Runtime ML, not the base Databricks Runtime):

#!/bin/bash
set -ex
/databricks/python/bin/python -V
. /databricks/conda/etc/profile.d/conda.sh
conda activate /databricks/python
conda install -y astropy

Uninstall a library from a cluster

Note

When you uninstall a library from a cluster, the library is removed only when you restart the cluster. Until you restart the cluster, the status of the uninstalled library appears as Uninstall pending restart.

To uninstall a library you can start from a cluster or a library:

Cluster

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click a cluster name.
  3. Click the Libraries tab.
  4. Select the checkbox next to the cluster you want to uninstall the library from, click Uninstall, then Confirm. The Status changes to Uninstall pending restart.

Library

  1. Go to the folder containing the library.
  2. Click the library name.
  3. Select the checkbox next to the cluster you want to uninstall the library from, click Uninstall, then Confirm. The Status changes to Uninstall pending restart.
  4. Click the cluster name to go to the cluster detail page.

Click Restart and Confirm to uninstall the library. The library is removed from the cluster’s Libraries tab.

View the libraries installed on a cluster

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click the cluster name.
  3. Click the Libraries tab. For each library, the tab displays the name and version, type, install status, and, if uploaded, the source file.

Update a cluster-installed library

To update a cluster-installed library, uninstall the old version of the library and install a new version.