Databricks Runtime 9.0 for Machine Learning (Unsupported)

Databricks released this image in August 2021.

Databricks Runtime 9.0 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 9.0 (Unsupported). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. It also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

Correction

A previous version of these release notes stated that support for monitoring cluster GPU metrics with Ganglia was disabled in Databricks Runtime 9.0 ML GPU. That was true for Databricks Runtime 9.0 ML Beta, but the issue was fixed with Databricks Runtime 9.0 ML GA. The statement has been removed.

New features and improvements

Databricks Runtime 9.0 ML is built on top of Databricks Runtime 9.0. For information on what’s new in Databricks Runtime 9.0, including Apache Spark MLlib and SparkR, see the Databricks Runtime 9.0 (Unsupported) release notes.

Databricks Autologging (Public Preview)

Databricks Autologging is now available for Databricks Runtime 9.0 for Machine Learning in select regions. Databricks Autologging is a no-code solution that provides automatic experiment tracking for machine learning training sessions on Databricks. With Databricks Autologging, model parameters, metrics, files, and lineage information are automatically captured when you train models from a variety of popular machine learning libraries. Training sessions are recorded as MLflow Tracking Runs. Model files are also tracked so you can easily log them to the MLflow Model Registry and deploy them for real-time scoring with MLflow Model Serving.

For more information about Databricks Autologging, see Databricks Autologging.

Improvements to Databricks Feature Store

Performance when creating a training set has been improved by minimizing the number of joins across source feature tables.

XGBoost integration with PySpark now supports distributed training and GPU clusters

For details, see Integration with Spark MLlib (Python).

Major changes to Databricks Runtime ML Python environment

Conda environments, along with the %conda command, are removed. Databricks Runtime 9.0 ML is built with pip and virtualenv. Custom images using Conda-based environments with Databricks Container Services will still be supported, but will not have notebook-scoped library capabilities. Databricks recommends using virtualenv-based environments with Databricks Container Services and %pip for all notebook-scoped libraries.

See Databricks Runtime 9.0 (Unsupported) for the major changes to the Databricks Runtime Python environment. For a full list of installed Python packages and their versions, see Python libraries.

Python packages upgraded

  • mlflow 1.18.0 -> 1.19.0

  • nltk 3.5 -> 3.6.1

Python packages added

  • prophet 1.0.1

Python packages removed

  • MKL

  • azure-core

  • azure-storage-blob

  • msrest

  • docker

  • querystring-parser

  • intel-openmp

Deprecations and unsupported features

  • In Databricks Runtime 9.0 ML, HorovodRunner does not support setting np=0, where np is the number of parallel processes to use for the Horovod job.

  • Databricks Runtime 9.0 ML includes r-base 4.1.0 with R graphics engine version 14. This is not supported by RStudio Server version 1.2.x.

  • nvprof is removed in Databricks Runtime 9.0 ML GPU.

System environment

The system environment in Databricks Runtime 9.0 ML differs from Databricks Runtime 9.0 as follows:

Libraries

The following sections list the libraries included in Databricks Runtime 9.0 ML that differ from those included in Databricks Runtime 9.0.

Python libraries

Databricks Runtime 9.0 ML uses Virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the following sections, Databricks Runtime 9.0 ML also includes the following packages:

  • hyperopt 0.2.5.db2

  • sparkdl 2.2.0_db1

  • feature_store 0.3.3

  • automl 1.1.1

Python libraries on CPU clusters

Library

Version

Library

Version

Library

Version

absl-py

0.11.0

Antergos Linux

2015.10 (ISO-Rolling)

appdirs

1.4.4

argon2-cffi

20.1.0

astor

0.8.1

astunparse

1.6.3

async-generator

1.10

attrs

20.3.0

backcall

0.2.0

bcrypt

3.2.0

bleach

3.3.0

boto3

1.16.7

botocore

1.19.7

Bottleneck

1.3.2

cachetools

4.2.2

certifi

2020.12.5

cffi

1.14.5

chardet

4.0.0

click

7.1.2

cloudpickle

1.6.0

cmdstanpy

0.9.68

configparser

5.0.1

convertdate

2.3.2

cryptography

3.4.7

cycler

0.10.0

Cython

0.29.23

databricks-cli

0.14.3

dbus-python

1.2.16

decorator

5.0.6

defusedxml

0.7.1

dill

0.3.2

diskcache

5.2.1

distlib

0.3.2

distro-info

0.23ubuntu1

entrypoints

0.3

ephem

4.0.0.2

facets-overview

1.0.0

filelock

3.0.12

Flask

1.1.2

flatbuffers

1.12

fsspec

0.9.0

future

0.18.2

gast

0.4.0

gitdb

4.0.7

GitPython

3.1.12

google-auth

1.22.1

google-auth-oauthlib

0.4.2

google-pasta

0.2.0

grpcio

1.34.1

gunicorn

20.0.4

h5py

3.1.0

hijri-converter

2.1.3

holidays

0.10.5.2

horovod

0.22.1

htmlmin

0.1.12

idna

2.10

ImageHash

4.2.1

ipykernel

5.3.4

ipython

7.22.0

ipython-genutils

0.2.0

ipywidgets

7.6.4

isodate

0.6.0

itsdangerous

1.1.0

jedi

0.17.2

Jinja2

2.11.3

jmespath

0.10.0

joblib

1.0.1

joblibspark

0.3.0

jsonschema

3.2.0

jupyter-client

6.1.12

jupyter-core

4.7.1

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.1

keras-nightly

2.5.0.dev2021032900

Keras-Preprocessing

1.1.2

kiwisolver

1.3.1

koalas

1.8.1

korean-lunar-calendar

0.2.1

lightgbm

3.1.1

llvmlite

0.36.0

LunarCalendar

0.0.9

Mako

1.1.3

Markdown

3.3.3

MarkupSafe

1.1.1

matplotlib

3.4.2

missingno

0.5.0

mistune

0.8.4

mleap

0.17.0

mlflow-skinny

1.19.0

multimethod

1.4

nbclient

0.5.3

nbconvert

6.0.7

nbformat

5.1.3

nest-asyncio

1.5.1

networkx

2.5

nltk

3.6.1

notebook

6.3.0

numba

0.53.1

numpy

1.19.2

oauthlib

3.1.0

opt-einsum

3.3.0

packaging

20.9

pandas

1.2.4

pandas-profiling

3.0.0

pandocfilters

1.4.3

paramiko

2.7.2

parso

0.7.0

patsy

0.5.1

petastorm

0.11.1

pexpect

4.8.0

phik

0.12.0

pickleshare

0.7.5

Pillow

8.2.0

pip

21.0.1

plotly

4.14.3

prometheus-client

0.10.1

prompt-toolkit

3.0.17

prophet

1.0.1

protobuf

3.17.2

psutil

5.8.0

psycopg2

2.8.5

ptyprocess

0.7.0

pyarrow

4.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pycparser

2.20

pydantic

1.8.2

Pygments

2.8.1

PyGObject

3.36.0

PyMeeus

0.5.11

PyNaCl

1.3.0

pyodbc

4.0.30

pyparsing

2.4.7

pyrsistent

0.17.3

pystan

2.19.1.1

python-apt

2.0.0+ubuntu0.20.4.6

python-dateutil

2.8.1

python-editor

1.0.4

pytz

2020.5

PyWavelets

1.1.1

PyYAML

5.4.1

pyzmq

20.0.0

regex

2021.4.4

requests

2.25.1

requests-oauthlib

1.3.0

requests-unixsocket

0.2.0

retrying

1.3.3

rsa

4.7.2

s3transfer

0.3.7

scikit-learn

0.24.1

scipy

1.6.2

seaborn

0.11.1

Send2Trash

1.5.0

setuptools

52.0.0

setuptools-git

1.2

shap

0.39.0

simplejson

3.17.2

six

1.15.0

slicer

0.0.7

smmap

3.0.5

spark-tensorflow-distributor

0.1.0

sqlparse

0.4.1

ssh-import-id

5.10

statsmodels

0.12.2

tabulate

0.8.7

tangled-up-in-unicode

0.1.0

tensorboard

2.5.0

tensorboard-data-server

0.6.1

tensorboard-plugin-wit

1.8.0

tensorflow-cpu

2.5.0

tensorflow-estimator

2.5.0

termcolor

1.1.0

terminado

0.9.4

testpath

0.4.4

threadpoolctl

2.1.0

torch

1.9.0+cpu

torchvision

0.10.0+cpu

tornado

6.1

tqdm

4.59.0

traitlets

5.0.5

typing-extensions

3.7.4.3

ujson

4.0.2

unattended-upgrades

0.1

urllib3

1.25.11

virtualenv

20.4.1

visions

0.7.1

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.57.0

Werkzeug

1.0.1

wheel

0.36.2

widgetsnbextension

3.5.1

wrapt

1.12.1

xgboost

1.4.2

Python libraries on GPU clusters

Library

Version

Library

Version

Library

Version

absl-py

0.11.0

Antergos Linux

2015.10 (ISO-Rolling)

appdirs

1.4.4

argon2-cffi

20.1.0

astor

0.8.1

astunparse

1.6.3

async-generator

1.10

attrs

20.3.0

backcall

0.2.0

bcrypt

3.2.0

bleach

3.3.0

boto3

1.16.7

botocore

1.19.7

Bottleneck

1.3.2

cachetools

4.2.2

certifi

2020.12.5

cffi

1.14.5

chardet

4.0.0

click

7.1.2

cloudpickle

1.6.0

cmdstanpy

0.9.68

configparser

5.0.1

convertdate

2.3.2

cryptography

3.4.7

cycler

0.10.0

Cython

0.29.23

databricks-cli

0.14.3

dbus-python

1.2.16

decorator

5.0.6

defusedxml

0.7.1

dill

0.3.2

diskcache

5.2.1

distlib

0.3.2

distro-info

0.23ubuntu1

entrypoints

0.3

ephem

4.0.0.2

facets-overview

1.0.0

filelock

3.0.12

Flask

1.1.2

flatbuffers

1.12

fsspec

0.9.0

future

0.18.2

gast

0.4.0

gitdb

4.0.7

GitPython

3.1.12

google-auth

1.22.1

google-auth-oauthlib

0.4.2

google-pasta

0.2.0

grpcio

1.34.1

gunicorn

20.0.4

h5py

3.1.0

hijri-converter

2.1.3

holidays

0.10.5.2

horovod

0.22.1

htmlmin

0.1.12

idna

2.10

ImageHash

4.2.1

ipykernel

5.3.4

ipython

7.22.0

ipython-genutils

0.2.0

ipywidgets

7.6.4

isodate

0.6.0

itsdangerous

1.1.0

jedi

0.17.2

Jinja2

2.11.3

jmespath

0.10.0

joblib

1.0.1

joblibspark

0.3.0

jsonschema

3.2.0

jupyter-client

6.1.12

jupyter-core

4.7.1

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.1

keras-nightly

2.5.0.dev2021032900

Keras-Preprocessing

1.1.2

kiwisolver

1.3.1

koalas

1.8.1

korean-lunar-calendar

0.2.1

lightgbm

3.1.1

llvmlite

0.36.0

LunarCalendar

0.0.9

Mako

1.1.3

Markdown

3.3.3

MarkupSafe

1.1.1

matplotlib

3.4.2

missingno

0.5.0

mistune

0.8.4

mleap

0.17.0

mlflow-skinny

1.19.0

multimethod

1.4

nbclient

0.5.3

nbconvert

6.0.7

nbformat

5.1.3

nest-asyncio

1.5.1

networkx

2.5

nltk

3.6.1

notebook

6.3.0

numba

0.53.1

numpy

1.19.2

oauthlib

3.1.0

opt-einsum

3.3.0

packaging

20.9

pandas

1.2.4

pandas-profiling

3.0.0

pandocfilters

1.4.3

paramiko

2.7.2

parso

0.7.0

patsy

0.5.1

petastorm

0.11.1

pexpect

4.8.0

phik

0.12.0

pickleshare

0.7.5

Pillow

8.2.0

pip

21.0.1

plotly

4.14.3

prometheus-client

0.11.0

prompt-toolkit

3.0.17

prophet

1.0.1

protobuf

3.17.2

psutil

5.8.0

psycopg2

2.8.5

ptyprocess

0.7.0

pyarrow

4.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pycparser

2.20

pydantic

1.8.2

Pygments

2.8.1

PyGObject

3.36.0

PyMeeus

0.5.11

PyNaCl

1.3.0

pyodbc

4.0.30

pyparsing

2.4.7

pyrsistent

0.17.3

pystan

2.19.1.1

python-apt

2.0.0+ubuntu0.20.4.6

python-dateutil

2.8.1

python-editor

1.0.4

pytz

2020.5

PyWavelets

1.1.1

PyYAML

5.4.1

pyzmq

20.0.0

regex

2021.4.4

requests

2.25.1

requests-oauthlib

1.3.0

requests-unixsocket

0.2.0

retrying

1.3.3

rsa

4.7.2

s3transfer

0.3.7

scikit-learn

0.24.1

scipy

1.6.2

seaborn

0.11.1

Send2Trash

1.5.0

setuptools

52.0.0

setuptools-git

1.2

shap

0.39.0

simplejson

3.17.2

six

1.15.0

slicer

0.0.7

smmap

3.0.5

spark-tensorflow-distributor

0.1.0

sqlparse

0.4.1

ssh-import-id

5.10

statsmodels

0.12.2

tabulate

0.8.7

tangled-up-in-unicode

0.1.0

tensorboard

2.5.0

tensorboard-data-server

0.6.1

tensorboard-plugin-wit

1.8.0

tensorflow

2.5.0

tensorflow-estimator

2.5.0

termcolor

1.1.0

terminado

0.9.4

testpath

0.4.4

threadpoolctl

2.1.0

torch

1.9.0+cu111

torchvision

0.10.0+cu111

tornado

6.1

tqdm

4.59.0

traitlets

5.0.5

typing-extensions

3.7.4.3

ujson

4.0.2

unattended-upgrades

0.1

urllib3

1.25.11

virtualenv

20.4.1

visions

0.7.1

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.57.0

Werkzeug

1.0.1

wheel

0.36.2

widgetsnbextension

3.5.1

wrapt

1.12.1

xgboost

1.4.2

Spark packages containing Python modules

Spark Package

Python Module

Version

graphframes

graphframes

0.8.1-db3-spark3.1

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 9.0.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 9.0, Databricks Runtime 9.0 ML contains the following JARs:

CPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

0.17.0-4882dc3

ml.dmlc

xgboost4j-spark_2.12

1.4.1

ml.dmlc

xgboost4j_2.12

1.4.1

org.graphframes

graphframes_2.12

0.8.1-db2-spark3.1

org.mlflow

mlflow-client

1.19.0

org.mlflow

mlflow-spark

1.19.0

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0

GPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

0.17.0-4882dc3

ml.dmlc

xgboost4j-gpu_2.12

1.4.1

ml.dmlc

xgboost4j-spark-gpu_2.12

1.4.1

org.graphframes

graphframes_2.12

0.8.1-db2-spark3.1

org.mlflow

mlflow-client

1.19.0

org.mlflow

mlflow-spark

1.19.0

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0