Databricks Runtime for Machine Learning

Databricks Runtime for Machine Learning (Databricks Runtime ML) automates the creation of a cluster optimized for machine learning. Databricks Runtime ML clusters include the most popular machine learning libraries, such as TensorFlow, PyTorch, Keras, and XGBoost, and also include libraries required for distributed training such as Horovod. Using Databricks Runtime ML speeds up cluster creation and ensures that the installed library versions are compatible.

Databricks Runtime ML is built on Databricks Runtime. For example, Databricks Runtime 6.5 ML is built on Databricks Runtime 6.5. The libraries included in the base Databricks Runtime are listed in the Databricks runtime release notes.

Warning

If you require HIPAA compliance, refer to HIPAA-compliant deployment.

Libraries included in Databricks Runtime ML

Note

Library utilities are not available in Databricks Runtime ML.

The Databricks Runtime ML includes a variety of popular ML libraries. The libraries are updated with each release to include new features and fixes.

Databricks has designated a subset of the supported libraries as top-tier libraries. For these libraries, Databricks provides a faster update cadence, updating to the latest package releases with each runtime release (barring dependency conflicts). Databricks also provides advanced support, testing, and embedded optimizations for top-tier libraries.

For a full list of top-tier and other provided libraries, see the following articles for each available runtime:

How to use Databricks Runtime ML

In addition to the pre-installed libraries, Databricks Runtime ML differs from Databricks Runtime in the cluster configuration and in how you manage Python packages.

Create a cluster using Databricks Runtime ML

When you create a cluster, select a Databricks Runtime ML version from the Databricks Runtime Version drop-down. Both CPU and GPU-enabled ML runtimes are available.

Select Databricks Runtime ML

If you select a GPU-enabled ML runtime, you are prompted to select a compatible Driver Type and Worker Type. Incompatible instance types are grayed out in the drop-downs. GPU-enabled instance types are listed under the GPU-Accelerated label.

Warning

Libraries in your workspace that automatically install into all clusters can conflict with the libraries included in Databricks Runtime ML. Before you create a cluster with Databricks Runtime ML, clear the Install automatically on all clusters checkbox for conflicting libraries.

Manage Python packages

In Databricks Runtime ML the Conda package manager is used to install Python packages. All Python packages are installed inside a single environment: /databricks/python2 on clusters using Python 2 and /databricks/python3 on clusters using Python 3. Switching (or activating) Conda environments is not supported.

List and install Python packages on the driver node

You can use conda and pip commands to list and install packages.

%sh
conda env list
%sh
conda install matplotlib -y

Tip

When you run shell commands inside notebooks using %sh, you cannot respond to interactive shells. To avoid blocking, pass the -y (--yes) flag to conda and pip commands.

Important

Any modifications to the current environment using this method are restricted to the notebook and the driver. The changes are reset when you detach and reattach the notebook. You can install a package on all workers using a library or an init script.

Databricks Runtime 6.3 ML and above
%sh conda install astropy

In Databricks Runtime 6.3 ML, you can directly use the %sh conda command to install libraries on the driver node. The -y option is not required since always_yes: True is set in the conda configuration file.

Databricks Runtime 5.5 ML
%sh /databricks/conda/bin/conda install -y -p /databricks/python astropy

Install Python packages on all cluster nodes

To install a package on all cluster nodes, call conda inside an init script. In your init script, activate the default environment and install packages using conda.

#!/bin/bash
set -ex
/databricks/python/bin/python -V
. /databricks/conda/etc/profile.d/conda.sh
conda activate /databricks/python
conda install -y astropy

AutoML support

Databricks Runtime ML includes tools to automate the model development process and help you efficiently find the best performing model.

  • Managed MLFlow manages the end-to-end model lifecycle, including tracking experimental runs, deploying and sharing models, and maintaining a centralized model registry.
  • Hyperopt, augmented with the SparkTrials class, automates and distributes ML model parameter tuning.

License

By using this version of Databricks Runtime, you agree to the terms and conditions outlined in the NVIDIA End User License Agreement (EULA) with respect to the CUDA, cuDNN, and Tesla libraries, and the NVIDIA End User License Agreement (with NCCL Supplement) for the NCCL library.