Conda for Python Package Management

In the Databricks Runtime for Machine Learning, the Conda package manager is used to install Python packages. All Python packages are installed inside a single environment. This environment is /databricks/python2 on clusters using Python 2 or /databricks/python3 on clusters using Python 3. Switching (or activating) Conda environments is not supported.

Install Python packages on the driver node

You can call the conda command inside a notebook to install a Python package on the driver (master) node of a cluster running Databricks Runtime ML. For some libraries you may need to detach and attach your notebook again before you can import a newly installed Python module.

%sh /databricks/conda/bin/conda install -y -p /databricks/python astropy


Python packages installed using the conda command inside notebooks are available only on the driver node and not on the worker nodes. You can install a package on all workers using a library or an init script.

Install Python packages on all cluster nodes

The easiest way to use Conda to install a package on all cluster nodes is to call conda inside an init script.

In your init script, activate the default environment and install packages using conda.

set -ex
/databricks/python/bin/python -V
. /databricks/conda/etc/profile.d/
conda activate /databricks/python
conda install -y astropy