Databricks Runtime 10.2 for Machine Learning (Unsupported)

Databricks released this image in December 2021.

Databricks Runtime 10.2 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 10.2 (Unsupported). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

New features and improvements

Databricks Runtime 10.2 ML is built on top of Databricks Runtime 10.2. For information on what’s new in Databricks Runtime 10.2, including Apache Spark MLlib and SparkR, see the Databricks Runtime 10.2 (Unsupported) release notes.

Databricks Autologging (Public Preview)

Databricks Autologging is now in Public Preview in all regions. Databricks Autologging is a no-code solution that provides automatic experiment tracking for machine learning training sessions on Databricks. With Databricks Autologging, model parameters, metrics, files, and lineage information are automatically captured when you train models from a variety of popular machine learning libraries. Training sessions are recorded as MLflow Tracking Runs. Model files are also tracked so you can easily log them to the MLflow Model Registry and deploy them for real-time scoring with MLflow Model Serving.

For more information about Databricks Autologging, see Databricks Autologging.

Enhancements to Databricks AutoML

The following enhancements have been made to Databricks AutoML.

  • AutoML ignores columns that have only a single value.

  • For classification and regression problems, the time column used to split the dataset into training, validation, and test sets chronologically can now be string type. Previously only timestamp and integer were supported. See Split data into train/validation/test sets for details.

Enhancements to Databricks Feature Store

The following enhancements have been made to Databricks Feature Store.

Simplified FeatureStoreClient interface

The FeatureStoreClient interface has been simplified.

  • FeatureStoreClient.create_feature_table() has been deprecated. Instead, use FeatureStoreClient.create_table().

  • FeatureStoreClient.get_feature_table() has been deprecated. Instead, use FeatureStoreClient.get_table().

  • All arguments to FeatureStoreClient.publish_table() other than name and online_store must be passed as keyword arguments.

For more information, see Work with feature tables and Python API.

Publish only selected columns to online stores

Databricks Feature Store now supports publishing only selected columns to an online store. For more information, see Publish selected features to an online store.

Major changes to Databricks Runtime ML Python environment

The Automated MLflow Tracking integration for Apache Spark MLlib, which was deprecated in Databricks Runtime 10.1 ML, is now disabled by default in Databricks Runtime 10.2 ML. It has been replaced by MLflow’s PySpark ML Autologging integration, which is enabled by default with Databricks Autologging. Autologging records additional information beyond what Automated MLflow tracking for MLlib captured, including the parameters, metrics, and artifacts associated with the best model.

Python packages upgraded

  • databricks-cli 0.14.3 => 0.16.2

  • keras 2.6.0 => 2.7.0

  • lightgbm 3.3.0 => 3.3.1

  • mlflow 1.21.0 => 1.22.0

  • plotly 5.3.0 => 5.3.1

  • shap 0.39.0 => 0.40.0

  • spacy 3.1.3 => 3.2.0

  • tensorboard 2.6.0 => 2.7.0

  • tensorflow 2.6.0 => 2.7.0

  • torch 1.9.1 => 1.10.0

  • torchvision 0.10.1 => 0.11.1

  • transformers 4.11.3 => 4.12.3

  • xgboost 1.4.2 => 1.5.0

System environment

The system environment in Databricks Runtime 10.2 ML differs from Databricks Runtime 10.2 as follows:

Libraries

The following sections list the libraries included in Databricks Runtime 10.2 ML that differ from those included in Databricks Runtime 10.2.

Python libraries

Databricks Runtime 10.2 ML uses Virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the in the following sections, Databricks Runtime 10.2 ML also includes the following packages:

  • hyperopt 0.2.7.db1

  • sparkdl 2.2.0-db5

  • feature_store 0.3.6

  • automl 1.5.0

Python libraries on CPU clusters

Library

Version

Library

Version

Library

Version

absl-py

0.11.0

Antergos Linux

2015.10 (ISO-Rolling)

appdirs

1.4.4

argon2-cffi

20.1.0

astor

0.8.1

astunparse

1.6.3

async-generator

1.10

attrs

20.3.0

backcall

0.2.0

bcrypt

3.2.0

bidict

0.21.4

bleach

3.3.0

blis

0.7.4

boto3

1.16.7

botocore

1.19.7

cachetools

4.2.4

catalogue

2.0.6

certifi

2020.12.5

cffi

1.14.5

chardet

4.0.0

click

7.1.2

cloudpickle

1.6.0

cmdstanpy

0.9.68

configparser

5.0.1

convertdate

2.3.2

cryptography

3.4.7

cycler

0.10.0

cymem

2.0.5

Cython

0.29.23

databricks-automl-runtime

0.2.4

databricks-cli

0.16.2

dbus-python

1.2.16

decorator

5.0.6

defusedxml

0.7.1

dill

0.3.2

diskcache

5.2.1

distlib

0.3.3

distro-info

0.23ubuntu1

entrypoints

0.3

ephem

4.1.1

facets-overview

1.0.0

fasttext

0.9.2

filelock

3.0.12

Flask

1.1.2

flatbuffers

2.0

fsspec

0.9.0

future

0.18.2

gast

0.4.0

gitdb

4.0.7

GitPython

3.1.12

google-auth

1.22.1

google-auth-oauthlib

0.4.2

google-pasta

0.2.0

grpcio

1.39.0

gunicorn

20.0.4

gviz-api

1.10.0

h5py

3.1.0

hijri-converter

2.2.2

holidays

0.11.3.1

horovod

0.23.0

htmlmin

0.1.12

huggingface-hub

0.1.2

idna

2.10

ImageHash

4.2.1

imbalanced-learn

0.8.1

importlib-metadata

3.10.0

ipykernel

5.3.4

ipython

7.22.0

ipython-genutils

0.2.0

ipywidgets

7.6.3

isodate

0.6.0

itsdangerous

1.1.0

jedi

0.17.2

Jinja2

2.11.3

jmespath

0.10.0

joblib

1.0.1

joblibspark

0.3.0

jsonschema

3.2.0

jupyter-client

6.1.12

jupyter-core

4.7.1

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keras

2.7.0

Keras-Preprocessing

1.1.2

kiwisolver

1.3.1

koalas

1.8.2

korean-lunar-calendar

0.2.1

langcodes

3.3.0

libclang

12.0.0

lightgbm

3.3.1

llvmlite

0.37.0

LunarCalendar

0.0.9

Mako

1.1.3

Markdown

3.3.3

MarkupSafe

2.0.1

matplotlib

3.4.2

missingno

0.5.0

mistune

0.8.4

mleap

0.18.1

mlflow-skinny

1.22.0

multimethod

1.6

murmurhash

1.0.5

nbclient

0.5.3

nbconvert

6.0.7

nbformat

5.1.3

nest-asyncio

1.5.1

networkx

2.5

nltk

3.6.1

notebook

6.3.0

numba

0.54.1

numpy

1.19.2

oauthlib

3.1.0

opt-einsum

3.3.0

packaging

21.3

pandas

1.2.4

pandas-profiling

3.1.0

pandocfilters

1.4.3

paramiko

2.7.2

parso

0.7.0

pathy

0.6.0

patsy

0.5.1

petastorm

0.11.3

pexpect

4.8.0

phik

0.12.0

pickleshare

0.7.5

Pillow

8.2.0

pip

21.0.1

plotly

5.3.1

preshed

3.0.5

prometheus-client

0.10.1

prompt-toolkit

3.0.17

prophet

1.0.1

protobuf

3.17.2

psutil

5.8.0

psycopg2

2.8.5

ptyprocess

0.7.0

pyarrow

4.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.8.1

pycparser

2.20

pydantic

1.8.2

Pygments

2.8.1

PyGObject

3.36.0

PyMeeus

0.5.11

PyNaCl

1.4.0

pyodbc

4.0.30

pyparsing

2.4.7

pyrsistent

0.17.3

pystan

2.19.1.1

python-apt

2.0.0+ubuntu0.20.4.6

python-dateutil

2.8.1

python-editor

1.0.4

python-engineio

4.3.0

python-socketio

5.4.1

pytz

2020.5

PyWavelets

1.1.1

PyYAML

5.4.1

pyzmq

20.0.0

regex

2021.4.4

requests

2.25.1

requests-oauthlib

1.3.0

requests-unixsocket

0.2.0

rsa

4.7.2

s3transfer

0.3.7

sacremoses

0.0.46

scikit-learn

0.24.1

scipy

1.6.2

seaborn

0.11.1

Send2Trash

1.5.0

setuptools

52.0.0

setuptools-git

1.2

shap

0.40.0

simplejson

3.17.2

six

1.15.0

slicer

0.0.7

smart-open

5.2.0

smmap

3.0.5

spacy

3.2.0

spacy-legacy

3.0.8

spacy-loggers

1.0.1

spark-tensorflow-distributor

1.0.0

sqlparse

0.4.1

srsly

2.4.1

ssh-import-id

5.10

statsmodels

0.12.2

tabulate

0.8.7

tangled-up-in-unicode

0.1.0

tenacity

6.2.0

tensorboard

2.7.0

tensorboard-data-server

0.6.1

tensorboard-plugin-profile

2.5.0

tensorboard-plugin-wit

1.8.0

tensorflow-cpu

2.7.0

tensorflow-estimator

2.7.0

tensorflow-io-gcs-filesystem

0.22.0

termcolor

1.1.0

terminado

0.9.4

testpath

0.4.4

thinc

8.0.12

threadpoolctl

2.1.0

tokenizers

0.10.3

torch

1.10.0+cpu

torchvision

0.11.1+cpu

tornado

6.1

tqdm

4.59.0

traitlets

5.0.5

transformers

4.12.3

typer

0.3.2

typing-extensions

3.7.4.3

ujson

4.0.2

unattended-upgrades

0.1

urllib3

1.25.11

virtualenv

20.4.1

visions

0.7.4

wasabi

0.8.2

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.57.0

Werkzeug

1.0.1

wheel

0.36.2

widgetsnbextension

3.5.1

wrapt

1.12.1

xgboost

1.5.0

zipp

3.4.1

Python libraries on GPU clusters

Library

Version

Library

Version

Library

Version

absl-py

0.11.0

Antergos Linux

2015.10 (ISO-Rolling)

appdirs

1.4.4

argon2-cffi

20.1.0

astor

0.8.1

astunparse

1.6.3

async-generator

1.10

attrs

20.3.0

backcall

0.2.0

bcrypt

3.2.0

bidict

0.21.4

bleach

3.3.0

blis

0.7.4

boto3

1.16.7

botocore

1.19.7

cachetools

4.2.4

catalogue

2.0.6

certifi

2020.12.5

cffi

1.14.5

chardet

4.0.0

click

7.1.2

cloudpickle

1.6.0

cmdstanpy

0.9.68

configparser

5.0.1

convertdate

2.3.2

cryptography

3.4.7

cycler

0.10.0

cymem

2.0.5

Cython

0.29.23

databricks-automl-runtime

0.2.4

databricks-cli

0.16.2

dbus-python

1.2.16

decorator

5.0.6

defusedxml

0.7.1

dill

0.3.2

diskcache

5.2.1

distlib

0.3.3

distro-info

0.23ubuntu1

entrypoints

0.3

ephem

4.1.1

facets-overview

1.0.0

fasttext

0.9.2

filelock

3.0.12

Flask

1.1.2

flatbuffers

2.0

fsspec

0.9.0

future

0.18.2

gast

0.4.0

gitdb

4.0.7

GitPython

3.1.12

google-auth

1.22.1

google-auth-oauthlib

0.4.2

google-pasta

0.2.0

grpcio

1.39.0

gunicorn

20.0.4

gviz-api

1.10.0

h5py

3.1.0

hijri-converter

2.2.2

holidays

0.11.3.1

horovod

0.23.0

htmlmin

0.1.12

huggingface-hub

0.1.2

idna

2.10

ImageHash

4.2.1

imbalanced-learn

0.8.1

importlib-metadata

3.10.0

ipykernel

5.3.4

ipython

7.22.0

ipython-genutils

0.2.0

ipywidgets

7.6.3

isodate

0.6.0

itsdangerous

1.1.0

jedi

0.17.2

Jinja2

2.11.3

jmespath

0.10.0

joblib

1.0.1

joblibspark

0.3.0

jsonschema

3.2.0

jupyter-client

6.1.12

jupyter-core

4.7.1

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keras

2.7.0

Keras-Preprocessing

1.1.2

kiwisolver

1.3.1

koalas

1.8.2

korean-lunar-calendar

0.2.1

langcodes

3.3.0

libclang

12.0.0

lightgbm

3.3.1

llvmlite

0.37.0

LunarCalendar

0.0.9

Mako

1.1.3

Markdown

3.3.3

MarkupSafe

2.0.1

matplotlib

3.4.2

missingno

0.5.0

mistune

0.8.4

mleap

0.18.1

mlflow-skinny

1.22.0

multimethod

1.6

murmurhash

1.0.5

nbclient

0.5.3

nbconvert

6.0.7

nbformat

5.1.3

nest-asyncio

1.5.1

networkx

2.5

nltk

3.6.1

notebook

6.3.0

numba

0.54.1

numpy

1.19.2

oauthlib

3.1.0

opt-einsum

3.3.0

packaging

21.3

pandas

1.2.4

pandas-profiling

3.1.0

pandocfilters

1.4.3

paramiko

2.7.2

parso

0.7.0

pathy

0.6.0

patsy

0.5.1

petastorm

0.11.3

pexpect

4.8.0

phik

0.12.0

pickleshare

0.7.5

Pillow

8.2.0

pip

21.0.1

plotly

5.3.1

preshed

3.0.5

prompt-toolkit

3.0.17

prophet

1.0.1

protobuf

3.17.2

psutil

5.8.0

psycopg2

2.8.5

ptyprocess

0.7.0

pyarrow

4.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.8.1

pycparser

2.20

pydantic

1.8.2

Pygments

2.8.1

PyGObject

3.36.0

PyMeeus

0.5.11

PyNaCl

1.4.0

pyodbc

4.0.30

pyparsing

2.4.7

pyrsistent

0.17.3

pystan

2.19.1.1

python-apt

2.0.0+ubuntu0.20.4.6

python-dateutil

2.8.1

python-editor

1.0.4

python-engineio

4.3.0

python-socketio

5.4.1

pytz

2020.5

PyWavelets

1.1.1

PyYAML

5.4.1

pyzmq

20.0.0

regex

2021.4.4

requests

2.25.1

requests-oauthlib

1.3.0

requests-unixsocket

0.2.0

rsa

4.7.2

s3transfer

0.3.7

sacremoses

0.0.46

scikit-learn

0.24.1

scipy

1.6.2

seaborn

0.11.1

Send2Trash

1.5.0

setuptools

52.0.0

setuptools-git

1.2

shap

0.40.0

simplejson

3.17.2

six

1.15.0

slicer

0.0.7

smart-open

5.2.0

smmap

3.0.5

spacy

3.2.0

spacy-legacy

3.0.8

spacy-loggers

1.0.1

spark-tensorflow-distributor

1.0.0

sqlparse

0.4.1

srsly

2.4.1

ssh-import-id

5.10

statsmodels

0.12.2

tabulate

0.8.7

tangled-up-in-unicode

0.1.0

tenacity

6.2.0

tensorboard

2.7.0

tensorboard-data-server

0.6.1

tensorboard-plugin-profile

2.5.0

tensorboard-plugin-wit

1.8.0

tensorflow

2.7.0

tensorflow-estimator

2.7.0

tensorflow-io-gcs-filesystem

0.22.0

termcolor

1.1.0

terminado

0.9.4

testpath

0.4.4

thinc

8.0.12

threadpoolctl

2.1.0

tokenizers

0.10.3

torch

1.10.0+cu111

torchvision

0.11.1+cu111

tornado

6.1

tqdm

4.59.0

traitlets

5.0.5

transformers

4.12.3

typer

0.3.2

typing-extensions

3.7.4.3

ujson

4.0.2

unattended-upgrades

0.1

urllib3

1.25.11

virtualenv

20.4.1

visions

0.7.4

wasabi

0.8.2

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.57.0

Werkzeug

1.0.1

wheel

0.36.2

widgetsnbextension

3.5.1

wrapt

1.12.1

xgboost

1.5.0

zipp

3.4.1

Spark packages containing Python modules

Spark Package

Python Module

Version

graphframes

graphframes

0.8.2-db1-spark3.2

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 10.2.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 10.2, Databricks Runtime 10.2 ML contains the following JARs:

CPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

0.18.1-23eb1ef

ml.dmlc

xgboost4j-spark_2.12

1.5.1

ml.dmlc

xgboost4j_2.12

1.5.1

org.graphframes

graphframes_2.12

0.8.2-db1-spark3.2

org.mlflow

mlflow-client

1.22.0

org.mlflow

mlflow-spark

1.22.0

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0

GPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

0.18.1-23eb1ef

ml.dmlc

xgboost4j-spark_2.12

1.5.1

ml.dmlc

xgboost4j_2.12

1.5.1

org.graphframes

graphframes_2.12

0.8.2-db1-spark3.2

org.mlflow

mlflow-client

1.22.0

org.mlflow

mlflow-spark

1.22.0

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0