Databricks Runtime for Machine Learning (Databricks Runtime ML) provides a ready-to-go environment for machine learning and data science. It contains multiple popular libraries, including TensorFlow, PyTorch, Keras, and XGBoost. It also supports distributed training using Horovod.
Databricks Runtime ML lets you start a Databricks cluster with all of the libraries required for distributed training. It ensures the compatibility of the libraries included on the cluster (between TensorFlow and CUDA / cuDNN, for example) and substantially speeds up cluster start-up.
If you require HIPAA compliance, refer to HIPAA-Compliant Deployment.
Library utilities are not available in Databricks Runtime 5.5 ML and below.
In this topic:
Databricks Runtime ML is built on Databricks Runtime. For example, Databricks Runtime 5.0 ML is built on Databricks Runtime 5.0. The libraries included in the base Databricks Runtime are listed in the Databricks Runtime Release Notes.
The Databricks Runtime ML includes a variety of popular ML libraries. The libraries are updated periodically to include new features and fixes.
Databricks has designated a subset of the supported libraries as top-tier libraries. For these libraries, Databricks provides a faster update cadence, updating to the latest upstream package releases with each runtime release (barring dependency conflicts). Databricks also provides advanced support, testing, and embedded optimizations for top-tier libraries.
For a full list of top-tier and other provided libraries, see the following topics for each available runtime:
When you create a cluster, select a Databricks Runtime ML version from the Databricks Runtime Version drop-down. Both CPU and GPU-enabled ML runtimes are available.
If you select a GPU-enabled ML runtime, you are prompted to select a compatible Driver Type and Worker Type. Incompatible instance types are grayed out in the drop-downs. GPU-enabled instance types are listed under the GPU-Accelerated label.
Libraries in your workspace that automatically install into all clusters can conflict with the libraries included in Databricks Runtime ML. Before you create a cluster with Databricks Runtime ML, clear the Install automatically on all clusters checkbox for conflicting libraries.
In Databricks Runtime ML the Conda package manager is used to install Python packages. All Python packages are installed inside a single environment:
/databricks/python2 on clusters using Python 2 and
/databricks/python3 on clusters using Python 3. Switching (or activating) Conda environments is not supported.
In this section:
You can call the
conda command inside a notebook to install a Python package on the driver (master) node of a cluster running Databricks Runtime ML.
For some libraries you may need to detach and attach your notebook again before you can import a newly installed Python module.
Python packages installed using the
conda command inside notebooks are available only on the driver node and not on the worker nodes. You can install a package on all workers using a library or an init script.
%sh conda install astropy
In Databricks Runtime 6.0 ML, you can directly use the
%sh conda command to install libraries on the driver node. The
-y option is not required since
always_yes: True is set in the conda configuration file.
The easiest way to use Conda to install a package on all cluster nodes is to call
conda inside an init script. In your init script, activate the default environment and install packages using
#!/bin/bash set -ex /databricks/python/bin/python -V . /databricks/conda/etc/profile.d/conda.sh conda activate /databricks/python conda install -y astropy
By using this version of Databricks Runtime, you agree to the terms and conditions outlined in the NVIDIA End User License Agreement (EULA) with respect to the CUDA, cuDNN, and Tesla libraries, and the NVIDIA End User License Agreement (with NCCL Supplement) for the NCCL library.