Databricks Runtime 12.0 for Machine Learning

Databricks Runtime 12.0 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 12.0. Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

New features and improvements

Databricks Runtime 12.0 ML is built on top of Databricks Runtime 12.0. For information on what’s new in Databricks Runtime 12.0, including Apache Spark MLlib and SparkR, see the Databricks Runtime 12.0 release notes.

Enhancements to Databricks AutoML

  • Forecasting models can now optionally include country holidays.

  • Forecasting now supports monthly, quarterly, and annual frequencies.

  • AutoML can now use larger datasets for training. AutoML automatically allocates more CPU cores for large datasets.

For more information about Databricks AutoML, see What is AutoML?.

MLflow 2.0

Databricks Runtime 12.0 ML includes MLflow 2.0. MLflow 2.0 builds upon MLflow’s strong platform foundation and incorporates extensive user feedback to simplify data science workflows and deliver innovative, first-class tools for MLOps. Features and improvements include extensions to MLflow Recipes (formerly MLflow Pipelines) such as AutoML, hyperparameter tuning, and classification support, as well modernized integrations with the ML ecosystem, a streamlined MLflow Tracking UI, a refresh of core APIs across MLflow’s platform components, and more. For more information, see the MLflow 2.0 documentation or check out the blog post.

scikit-learn 1.0

Databricks Runtime ML 12.0 includes scikit-learn version 1.0. Visit the scikit-learn documentation to learn about changes with this scikit-learn release.

System environment

The system environment in Databricks Runtime 12.0 ML differs from Databricks Runtime 12.0 as follows:

Databricks Runtime 12.0 ML includes XGBoost 1.6.2, which does not support GPU clusters with compute capability 5.2 and below.

Libraries

The following sections list the libraries included in Databricks Runtime 12.0 ML that differ from those included in Databricks Runtime 12.0.

Python libraries

Databricks Runtime 12.0 ML uses Virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the following sections, Databricks Runtime 12.0 ML also includes the following packages:

  • hyperopt 0.2.7.db1

  • sparkdl 2.3.0-db3

  • automl 1.14.1

To reproduce the Databricks Runtime ML Python environment in your local Python virtual environment, download the requirements-12.0.txt file and run pip install -r requirements-12.0.txt. This command installs all of the open source libraries that Databricks Runtime ML uses, but does not install libraries developed by Databricks, such as databricks-automl, databricks-feature-store, or the Databricks fork of hyperopt.

Python libraries on CPU clusters

Library

Version

Library

Version

Library

Version

absl-py

1.0.0

argon2-cffi

21.3.0

argon2-cffi-bindings

21.2.0

astor

0.8.1

asttokens

2.0.5

astunparse

1.6.3

attrs

21.4.0

azure-core

1.26.1

azure-cosmos

4.2.0

backcall

0.2.0

backports.entry-points-selectable

1.2.0

bcrypt

3.2.0

beautifulsoup4

4.11.1

black

22.3.0

bleach

4.1.0

blis

0.7.9

boto3

1.21.32

botocore

1.24.32

cachetools

4.2.2

catalogue

2.0.8

category-encoders

2.5.1.post0

certifi

2021.10.8

cffi

1.15.0

chardet

4.0.0

charset-normalizer

2.0.4

click

8.0.4

cloudpickle

2.0.0

cmdstanpy

1.0.8

confection

0.0.3

configparser

5.2.0

convertdate

2.4.0

cryptography

3.4.8

cycler

0.11.0

cymem

2.0.7

Cython

0.29.28

databricks-automl-runtime

0.2.13

databricks-cli

0.17.3

databricks-feature-store

0.8.0

dbl-tempo

0.1.12

dbus-python

1.2.16

debugpy

1.5.1

decorator

5.1.1

defusedxml

0.7.1

dill

0.3.4

diskcache

5.4.0

distlib

0.3.6

entrypoints

0.4

ephem

4.1.3

executing

0.8.3

facets-overview

1.0.0

fastjsonschema

2.16.2

fasttext

0.9.2

filelock

3.6.0

Flask

1.1.2

flatbuffers

22.10.26

fonttools

4.25.0

fsspec

2022.2.0

future

0.18.2

gast

0.4.0

gitdb

4.0.9

GitPython

3.1.27

google-auth

1.33.0

google-auth-oauthlib

0.4.6

google-pasta

0.2.0

grpcio

1.42.0

gunicorn

20.1.0

gviz-api

1.10.0

h5py

3.6.0

hijri-converter

2.2.4

holidays

0.16

horovod

0.25.0

htmlmin

0.1.12

huggingface-hub

0.11.0

idna

3.3

ImageHash

4.3.1

imbalanced-learn

0.8.1

importlib-metadata

4.11.3

ipykernel

6.15.3

ipython

8.5.0

ipython-genutils

0.2.0

ipywidgets

7.7.2

isodate

0.6.1

itsdangerous

2.0.1

jedi

0.18.1

Jinja2

2.11.3

jmespath

0.10.0

joblib

1.1.0

joblibspark

0.5.0

jsonschema

4.4.0

jupyter-client

6.1.12

jupyter_core

4.11.2

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keras

2.10.0

Keras-Preprocessing

1.1.2

kiwisolver

1.3.2

korean-lunar-calendar

0.3.1

langcodes

3.3.0

libclang

14.0.6

lightgbm

3.3.3

llvmlite

0.38.0

LunarCalendar

0.0.9

Mako

1.2.0

Markdown

3.3.4

MarkupSafe

2.0.1

matplotlib

3.5.1

matplotlib-inline

0.1.2

missingno

0.5.1

mistune

0.8.4

mleap

0.20.0

mlflow-skinny

2.0.1

multimethod

1.8

murmurhash

1.0.9

mypy-extensions

0.4.3

nbclient

0.5.13

nbconvert

6.4.4

nbformat

5.3.0

nest-asyncio

1.5.5

networkx

2.7.1

nltk

3.7

notebook

6.4.8

numba

0.55.1

numpy

1.21.5

oauthlib

3.2.0

opt-einsum

3.3.0

packaging

21.3

pandas

1.4.2

pandas-profiling

3.3.0

pandocfilters

1.5.0

paramiko

2.9.2

parso

0.8.3

pathspec

0.9.0

pathy

0.6.1

patsy

0.5.2

petastorm

0.11.4

pexpect

4.8.0

phik

0.12.2

pickleshare

0.7.5

Pillow

9.0.1

pip

21.2.4

platformdirs

2.5.4

plotly

5.6.0

pmdarima

2.0.1

preshed

3.0.8

prometheus-client

0.13.1

prompt-toolkit

3.0.20

prophet

1.1.1

protobuf

3.19.4

psutil

5.8.0

psycopg2

2.9.3

ptyprocess

0.7.0

pure-eval

0.2.2

pyarrow

7.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.10.1

pycparser

2.21

pydantic

1.9.2

Pygments

2.11.2

PyGObject

3.36.0

PyJWT

2.6.0

PyMeeus

0.5.11

PyNaCl

1.5.0

pyodbc

4.0.32

pyparsing

3.0.4

pyrsistent

0.18.0

python-dateutil

2.8.2

python-editor

1.0.4

pytz

2021.3

PyWavelets

1.3.0

PyYAML

6.0

pyzmq

22.3.0

regex

2022.3.15

requests

2.27.1

requests-oauthlib

1.3.1

requests-unixsocket

0.2.0

rsa

4.7.2

s3transfer

0.5.0

scikit-learn

1.0.2

scipy

1.7.3

seaborn

0.11.2

Send2Trash

1.8.0

setuptools

61.2.0

setuptools-git

1.2

shap

0.41.0

simplejson

3.17.6

six

1.16.0

slicer

0.0.7

smart-open

5.1.0

smmap

5.0.0

soupsieve

2.3.1

spacy

3.4.1

spacy-legacy

3.0.10

spacy-loggers

1.0.3

spark-tensorflow-distributor

1.0.0

sqlparse

0.4.2

srsly

2.4.5

ssh-import-id

5.10

stack-data

0.2.0

statsmodels

0.13.2

tabulate

0.8.9

tangled-up-in-unicode

0.2.0

tenacity

8.0.1

tensorboard

2.10.0

tensorboard-data-server

0.6.1

tensorboard-plugin-profile

2.8.0

tensorboard-plugin-wit

1.8.1

tensorflow-cpu

2.10.0

tensorflow-estimator

2.10.0

tensorflow-io-gcs-filesystem

0.28.0

termcolor

2.1.1

terminado

0.13.1

testpath

0.5.0

thinc

8.1.5

threadpoolctl

2.2.0

tokenize-rt

4.2.1

tokenizers

0.13.2

tomli

1.2.2

torch

1.12.1+cpu

torchvision

0.13.1+cpu

tornado

6.1

tqdm

4.64.0

traitlets

5.1.1

transformers

4.23.1

typer

0.4.2

typing_extensions

4.1.1

unattended-upgrades

0.1

urllib3

1.26.9

virtualenv

20.8.0

visions

0.7.5

wasabi

0.10.1

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.58.0

Werkzeug

2.0.3

wheel

0.37.1

widgetsnbextension

3.6.1

wrapt

1.12.1

zipp

3.7.0

Python libraries on GPU clusters

Library

Version

Library

Version

Library

Version

absl-py

1.0.0

argon2-cffi

21.3.0

argon2-cffi-bindings

21.2.0

astor

0.8.1

asttokens

2.0.5

astunparse

1.6.3

attrs

21.4.0

azure-core

1.26.1

azure-cosmos

4.2.0

backcall

0.2.0

backports.entry-points-selectable

1.2.0

bcrypt

3.2.0

beautifulsoup4

4.11.1

black

22.3.0

bleach

4.1.0

blis

0.7.9

boto3

1.21.32

botocore

1.24.32

cachetools

4.2.2

catalogue

2.0.8

category-encoders

2.5.1.post0

certifi

2021.10.8

cffi

1.15.0

chardet

4.0.0

charset-normalizer

2.0.4

click

8.0.4

cloudpickle

2.0.0

cmdstanpy

1.0.8

confection

0.0.3

configparser

5.2.0

convertdate

2.4.0

cryptography

3.4.8

cycler

0.11.0

cymem

2.0.7

Cython

0.29.28

databricks-automl-runtime

0.2.13

databricks-cli

0.17.3

databricks-feature-store

0.8.0

dbl-tempo

0.1.12

dbus-python

1.2.16

debugpy

1.5.1

decorator

5.1.1

defusedxml

0.7.1

dill

0.3.4

diskcache

5.4.0

distlib

0.3.6

entrypoints

0.4

ephem

4.1.3

executing

0.8.3

facets-overview

1.0.0

fastjsonschema

2.16.2

fasttext

0.9.2

filelock

3.6.0

Flask

1.1.2

flatbuffers

22.10.26

fonttools

4.25.0

fsspec

2022.2.0

future

0.18.2

gast

0.4.0

gitdb

4.0.9

GitPython

3.1.27

google-auth

1.33.0

google-auth-oauthlib

0.4.6

google-pasta

0.2.0

grpcio

1.42.0

gunicorn

20.1.0

gviz-api

1.10.0

h5py

3.6.0

hijri-converter

2.2.4

holidays

0.16

horovod

0.25.0

htmlmin

0.1.12

huggingface-hub

0.11.0

idna

3.3

ImageHash

4.3.1

imbalanced-learn

0.8.1

importlib-metadata

4.11.3

ipykernel

6.15.3

ipython

8.5.0

ipython-genutils

0.2.0

ipywidgets

7.7.2

isodate

0.6.1

itsdangerous

2.0.1

jedi

0.18.1

Jinja2

2.11.3

jmespath

0.10.0

joblib

1.1.0

joblibspark

0.5.0

jsonschema

4.4.0

jupyter-client

6.1.12

jupyter_core

4.11.2

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keras

2.10.0

Keras-Preprocessing

1.1.2

kiwisolver

1.3.2

korean-lunar-calendar

0.3.1

langcodes

3.3.0

libclang

14.0.6

lightgbm

3.3.3

llvmlite

0.38.0

LunarCalendar

0.0.9

Mako

1.2.0

Markdown

3.3.4

MarkupSafe

2.0.1

matplotlib

3.5.1

matplotlib-inline

0.1.2

missingno

0.5.1

mistune

0.8.4

mleap

0.20.0

mlflow-skinny

2.0.1

multimethod

1.8

murmurhash

1.0.9

mypy-extensions

0.4.3

nbclient

0.5.13

nbconvert

6.4.4

nbformat

5.3.0

nest-asyncio

1.5.5

networkx

2.7.1

nltk

3.7

notebook

6.4.8

numba

0.55.1

numpy

1.21.5

oauthlib

3.2.0

opt-einsum

3.3.0

packaging

21.3

pandas

1.4.2

pandas-profiling

3.3.0

pandocfilters

1.5.0

paramiko

2.9.2

parso

0.8.3

pathspec

0.9.0

pathy

0.6.1

patsy

0.5.2

petastorm

0.11.4

pexpect

4.8.0

phik

0.12.2

pickleshare

0.7.5

Pillow

9.0.1

pip

21.2.4

platformdirs

2.5.4

plotly

5.6.0

pmdarima

2.0.1

preshed

3.0.8

prompt-toolkit

3.0.20

prophet

1.1.1

protobuf

3.19.4

psutil

5.8.0

psycopg2

2.9.3

ptyprocess

0.7.0

pure-eval

0.2.2

pyarrow

7.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.10.1

pycparser

2.21

pydantic

1.9.2

Pygments

2.11.2

PyGObject

3.36.0

PyJWT

2.6.0

PyMeeus

0.5.11

PyNaCl

1.5.0

pyodbc

4.0.32

pyparsing

3.0.4

pyrsistent

0.18.0

python-dateutil

2.8.2

python-editor

1.0.4

pytz

2021.3

PyWavelets

1.3.0

PyYAML

6.0

pyzmq

22.3.0

regex

2022.3.15

requests

2.27.1

requests-oauthlib

1.3.1

requests-unixsocket

0.2.0

rsa

4.7.2

s3transfer

0.5.0

scikit-learn

1.0.2

scipy

1.7.3

seaborn

0.11.2

Send2Trash

1.8.0

setuptools

61.2.0

setuptools-git

1.2

shap

0.41.0

simplejson

3.17.6

six

1.16.0

slicer

0.0.7

smart-open

5.1.0

smmap

5.0.0

soupsieve

2.3.1

spacy

3.4.1

spacy-legacy

3.0.10

spacy-loggers

1.0.3

spark-tensorflow-distributor

1.0.0

sqlparse

0.4.2

srsly

2.4.5

ssh-import-id

5.10

stack-data

0.2.0

statsmodels

0.13.2

tabulate

0.8.9

tangled-up-in-unicode

0.2.0

tenacity

8.0.1

tensorboard

2.10.0

tensorboard-data-server

0.6.1

tensorboard-plugin-profile

2.8.0

tensorboard-plugin-wit

1.8.1

tensorflow

2.10.0

tensorflow-estimator

2.10.0

tensorflow-io-gcs-filesystem

0.28.0

termcolor

2.1.1

terminado

0.13.1

testpath

0.5.0

thinc

8.1.5

threadpoolctl

2.2.0

tokenize-rt

4.2.1

tokenizers

0.13.2

tomli

1.2.2

torch

1.12.1+cu113

torchvision

0.13.1+cu113

tornado

6.1

tqdm

4.64.0

traitlets

5.1.1

transformers

4.23.1

typer

0.4.2

typing_extensions

4.1.1

unattended-upgrades

0.1

urllib3

1.26.9

virtualenv

20.8.0

visions

0.7.5

wasabi

0.10.1

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.58.0

Werkzeug

2.0.3

wheel

0.37.1

widgetsnbextension

3.6.1

wrapt

1.12.1

zipp

3.7.0

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 12.0.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 12.0, Databricks Runtime 12.0 ML contains the following JARs:

CPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

v0.20.0-db1

ml.dmlc

xgboost4j-spark_2.12

1.6.2

ml.dmlc

xgboost4j_2.12

1.6.2

org.graphframes

graphframes_2.12

0.8.2-db1-spark3.2

org.mlflow

mlflow-client

2.0.1

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0

GPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

v0.20.0-db1

ml.dmlc

xgboost4j-gpu_2.12

1.6.2

ml.dmlc

xgboost4j-spark-gpu_2.12

1.6.2

org.graphframes

graphframes_2.12

0.8.2-db1-spark3.2

org.mlflow

mlflow-client

2.0.1

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0