Databricks Runtime 10.4 LTS for Machine Learning

Databricks Runtime 10.4 LTS for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 10.4 LTS. Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod.

Note

LTS means this version is under long-term support. See Databricks Runtime LTS version lifecycle.

For more information, including instructions for creating a Databricks Runtime ML cluster, see AI and Machine Learning on Databricks.

New features and improvements

Databricks Runtime 10.4 LTS ML is built on top of Databricks Runtime 10.4 LTS. For information on what’s new in Databricks Runtime 10.4 LTS, including Apache Spark MLlib and SparkR, see the Databricks Runtime 10.4 LTS release notes.

Enhancements to Databricks AutoML

The following enhancements have been made to Databricks AutoML.

Databricks AutoML is generally available

Starting with Databricks Runtime 10.4 LTS ML, Databricks AutoML is generally available.

Imputation of missing values

You can now specify how null values are imputed. By default, AutoML selects an imputation method based on the column type and content. See Imputation of missing values.

Column selection from UI

For classification and regression problems, you can now use the UI in addition to the API to specify columns that AutoML should ignore during its calculations. See Column selection for details.

New data type

AutoML now supports numerical array types.

Custom location of generated notebooks and experiment

You can now specify a location in the workspace where AutoML should save generated notebooks and experiments. Use the experiment_dir parameter. See Classification and regression parameters.

Enhancements to Databricks Feature Store

The following enhancements have been made to Databricks Feature Store.

  • You can now register an existing Delta table as a feature table.

System environment

The system environment in Databricks Runtime 10.4 LTS ML differs from Databricks Runtime 10.4 LTS as follows:

Libraries

The following sections list the libraries included in Databricks Runtime 10.4 LTS ML that differ from those included in Databricks Runtime 10.4 LTS.

Python libraries

Databricks Runtime 10.4 LTS ML uses Virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the in the following sections, Databricks Runtime 10.4 LTS ML also includes the following packages:

  • hyperopt 0.2.7.db1

  • sparkdl 2.2.0-db5

  • feature_store 0.3.8

  • automl 1.7.2

Python libraries on CPU clusters

To reproduce the Databricks Runtime ML Python environment in your local Python virtual environment, download the requirements-10.4.txt file and run pip install -r requirements-10.4.txt. This command installs all of the open source libraries that Databricks Runtime ML uses, but does not install Databricks developed libraries, such as databricks-automl, databricks-feature-store, or the Databricks fork of hyperopt.

Library

Version

Library

Version

Library

Version

absl-py

0.11.0

Antergos Linux

2015.10 (ISO-Rolling)

appdirs

1.4.4

argon2-cffi

20.1.0

astor

0.8.1

astunparse

1.6.3

async-generator

1.10

attrs

20.3.0

backcall

0.2.0

bcrypt

3.2.0

bidict

0.21.4

bleach

3.3.0

blis

0.7.4

boto3

1.16.7

botocore

1.19.7

cachetools

4.2.4

catalogue

2.0.6

certifi

2020.12.5

cffi

1.14.5

chardet

4.0.0

click

7.1.2

cloudpickle

1.6.0

cmdstanpy

0.9.68

configparser

5.0.1

convertdate

2.3.2

cryptography

3.4.7

cycler

0.10.0

cymem

2.0.5

Cython

0.29.23

databricks-automl-runtime

0.2.6

databricks-cli

0.16.3

dbl-tempo

0.1.2

dbus-python

1.2.16

decorator

5.0.6

defusedxml

0.7.1

dill

0.3.2

diskcache

5.2.1

distlib

0.3.4

distro-info

0.23ubuntu1

entrypoints

0.3

ephem

4.1.3

facets-overview

1.0.0

fasttext

0.9.2

filelock

3.0.12

Flask

1.1.2

flatbuffers

2.0

fsspec

0.9.0

future

0.18.2

gast

0.4.0

gitdb

4.0.7

GitPython

3.1.12

google-auth

1.22.1

google-auth-oauthlib

0.4.2

google-pasta

0.2.0

grpcio

1.39.0

gunicorn

20.0.4

gviz-api

1.10.0

h5py

3.1.0

hijri-converter

2.2.3

holidays

0.12

horovod

0.23.0

htmlmin

0.1.12

huggingface-hub

0.1.2

idna

2.10

ImageHash

4.2.1

imbalanced-learn

0.8.1

importlib-metadata

3.10.0

ipykernel

5.3.4

ipython

7.22.0

ipython-genutils

0.2.0

ipywidgets

7.6.3

isodate

0.6.0

itsdangerous

1.1.0

jedi

0.17.2

Jinja2

2.11.3

jmespath

0.10.0

joblib

1.0.1

joblibspark

0.3.0

jsonschema

3.2.0

jupyter-client

6.1.12

jupyter-core

4.7.1

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keras

2.8.0

Keras-Preprocessing

1.1.2

kiwisolver

1.3.1

koalas

1.8.2

korean-lunar-calendar

0.2.1

langcodes

3.3.0

libclang

13.0.0

lightgbm

3.3.2

llvmlite

0.38.0

LunarCalendar

0.0.9

Mako

1.1.3

Markdown

3.3.3

MarkupSafe

2.0.1

matplotlib

3.4.2

missingno

0.5.1

mistune

0.8.4

mleap

0.18.1

mlflow-skinny

1.24.0

multimethod

1.7

murmurhash

1.0.5

nbclient

0.5.3

nbconvert

6.0.7

nbformat

5.1.3

nest-asyncio

1.5.1

networkx

2.5

nltk

3.6.1

notebook

6.3.0

numba

0.55.1

numpy

1.20.1

oauthlib

3.1.0

opt-einsum

3.3.0

packaging

21.3

pandas

1.2.4

pandas-profiling

3.1.0

pandocfilters

1.4.3

paramiko

2.7.2

parso

0.7.0

pathy

0.6.0

patsy

0.5.1

petastorm

0.11.4

pexpect

4.8.0

phik

0.12.0

pickleshare

0.7.5

Pillow

8.2.0

pip

21.0.1

plotly

5.5.0

pmdarima

1.8.4

preshed

3.0.5

prometheus-client

0.10.1

prompt-toolkit

3.0.17

prophet

1.0.1

protobuf

3.17.2

psutil

5.8.0

psycopg2

2.8.5

ptyprocess

0.7.0

pyarrow

4.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.9.1

pycparser

2.20

pydantic

1.8.2

Pygments

2.8.1

PyGObject

3.36.0

PyMeeus

0.5.11

PyNaCl

1.4.0

pyodbc

4.0.30

pyparsing

2.4.7

pyrsistent

0.17.3

pystan

2.19.1.1

python-apt

2.0.0+ubuntu0.20.4.7

python-dateutil

2.8.1

python-editor

1.0.4

python-engineio

4.3.0

python-socketio

5.4.1

pytz

2020.5

PyWavelets

1.1.1

PyYAML

5.4.1

pyzmq

20.0.0

regex

2021.4.4

requests

2.25.1

requests-oauthlib

1.3.0

requests-unixsocket

0.2.0

rsa

4.7.2

s3transfer

0.3.7

sacremoses

0.0.46

scikit-learn

0.24.1

scipy

1.6.2

seaborn

0.11.1

Send2Trash

1.5.0

setuptools

52.0.0

setuptools-git

1.2

shap

0.40.0

simplejson

3.17.2

six

1.15.0

slicer

0.0.7

smart-open

5.2.0

smmap

3.0.5

spacy

3.2.1

spacy-legacy

3.0.8

spacy-loggers

1.0.1

spark-tensorflow-distributor

1.0.0

sqlparse

0.4.1

srsly

2.4.1

ssh-import-id

5.10

statsmodels

0.12.2

tabulate

0.8.7

tangled-up-in-unicode

0.1.0

tenacity

6.2.0

tensorboard

2.8.0

tensorboard-data-server

0.6.1

tensorboard-plugin-profile

2.5.0

tensorboard-plugin-wit

1.8.1

tensorflow-cpu

2.8.0

tensorflow-estimator

2.8.0

tensorflow-io-gcs-filesystem

0.24.0

termcolor

1.1.0

terminado

0.9.4

testpath

0.4.4

tf-estimator-nightly

2.8.0.dev2021122109

thinc

8.0.12

threadpoolctl

2.1.0

tokenizers

0.10.3

torch

1.10.2+cpu

torchvision

0.11.3+cpu

tornado

6.1

tqdm

4.59.0

traitlets

5.0.5

transformers

4.16.2

typer

0.3.2

typing-extensions

3.7.4.3

ujson

4.0.2

unattended-upgrades

0.1

urllib3

1.25.11

virtualenv

20.4.1

visions

0.7.4

wasabi

0.8.2

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.57.0

Werkzeug

1.0.1

wheel

0.36.2

widgetsnbextension

3.5.1

wrapt

1.12.1

xgboost

1.5.2

zipp

3.4.1

Python libraries on GPU clusters

Library

Version

Library

Version

Library

Version

absl-py

0.11.0

Antergos Linux

2015.10 (ISO-Rolling)

appdirs

1.4.4

argon2-cffi

20.1.0

astor

0.8.1

astunparse

1.6.3

async-generator

1.10

attrs

20.3.0

backcall

0.2.0

bcrypt

3.2.0

bidict

0.21.4

bleach

3.3.0

blis

0.7.4

boto3

1.16.7

botocore

1.19.7

cachetools

4.2.4

catalogue

2.0.6

certifi

2020.12.5

cffi

1.14.5

chardet

4.0.0

click

7.1.2

cloudpickle

1.6.0

cmdstanpy

0.9.68

configparser

5.0.1

convertdate

2.3.2

cryptography

3.4.7

cycler

0.10.0

cymem

2.0.5

Cython

0.29.23

databricks-automl-runtime

0.2.6

databricks-cli

0.16.3

dbl-tempo

0.1.2

dbus-python

1.2.16

decorator

5.0.6

defusedxml

0.7.1

dill

0.3.2

diskcache

5.2.1

distlib

0.3.4

distro-info

0.23ubuntu1

entrypoints

0.3

ephem

4.1.3

facets-overview

1.0.0

fasttext

0.9.2

filelock

3.0.12

Flask

1.1.2

flatbuffers

2.0

fsspec

0.9.0

future

0.18.2

gast

0.4.0

gitdb

4.0.7

GitPython

3.1.12

google-auth

1.22.1

google-auth-oauthlib

0.4.2

google-pasta

0.2.0

grpcio

1.39.0

gunicorn

20.0.4

gviz-api

1.10.0

h5py

3.1.0

hijri-converter

2.2.3

holidays

0.12

horovod

0.23.0

htmlmin

0.1.12

huggingface-hub

0.1.2

idna

2.10

ImageHash

4.2.1

imbalanced-learn

0.8.1

importlib-metadata

3.10.0

ipykernel

5.3.4

ipython

7.22.0

ipython-genutils

0.2.0

ipywidgets

7.6.3

isodate

0.6.0

itsdangerous

1.1.0

jedi

0.17.2

Jinja2

2.11.3

jmespath

0.10.0

joblib

1.0.1

joblibspark

0.3.0

jsonschema

3.2.0

jupyter-client

6.1.12

jupyter-core

4.7.1

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keras

2.8.0

Keras-Preprocessing

1.1.2

kiwisolver

1.3.1

koalas

1.8.2

korean-lunar-calendar

0.2.1

langcodes

3.3.0

libclang

13.0.0

lightgbm

3.3.2

llvmlite

0.38.0

LunarCalendar

0.0.9

Mako

1.1.3

Markdown

3.3.3

MarkupSafe

2.0.1

matplotlib

3.4.2

missingno

0.5.1

mistune

0.8.4

mleap

0.18.1

mlflow-skinny

1.24.0

multimethod

1.7

murmurhash

1.0.5

nbclient

0.5.3

nbconvert

6.0.7

nbformat

5.1.3

nest-asyncio

1.5.1

networkx

2.5

nltk

3.6.1

notebook

6.3.0

numba

0.55.1

numpy

1.20.1

oauthlib

3.1.0

opt-einsum

3.3.0

packaging

21.3

pandas

1.2.4

pandas-profiling

3.1.0

pandocfilters

1.4.3

paramiko

2.7.2

parso

0.7.0

pathy

0.6.0

patsy

0.5.1

petastorm

0.11.4

pexpect

4.8.0

phik

0.12.0

pickleshare

0.7.5

Pillow

8.2.0

pip

21.0.1

plotly

5.5.0

pmdarima

1.8.4

preshed

3.0.5

prompt-toolkit

3.0.17

prophet

1.0.1

protobuf

3.17.2

psutil

5.8.0

psycopg2

2.8.5

ptyprocess

0.7.0

pyarrow

4.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.9.1

pycparser

2.20

pydantic

1.8.2

Pygments

2.8.1

PyGObject

3.36.0

PyMeeus

0.5.11

PyNaCl

1.4.0

pyodbc

4.0.30

pyparsing

2.4.7

pyrsistent

0.17.3

pystan

2.19.1.1

python-apt

2.0.0+ubuntu0.20.4.7

python-dateutil

2.8.1

python-editor

1.0.4

python-engineio

4.3.0

python-socketio

5.4.1

pytz

2020.5

PyWavelets

1.1.1

PyYAML

5.4.1

pyzmq

20.0.0

regex

2021.4.4

requests

2.25.1

requests-oauthlib

1.3.0

requests-unixsocket

0.2.0

rsa

4.7.2

s3transfer

0.3.7

sacremoses

0.0.46

scikit-learn

0.24.1

scipy

1.6.2

seaborn

0.11.1

Send2Trash

1.5.0

setuptools

52.0.0

setuptools-git

1.2

shap

0.40.0

simplejson

3.17.2

six

1.15.0

slicer

0.0.7

smart-open

5.2.0

smmap

3.0.5

spacy

3.2.1

spacy-legacy

3.0.8

spacy-loggers

1.0.1

spark-tensorflow-distributor

1.0.0

sqlparse

0.4.1

srsly

2.4.1

ssh-import-id

5.10

statsmodels

0.12.2

tabulate

0.8.7

tangled-up-in-unicode

0.1.0

tenacity

6.2.0

tensorboard

2.8.0

tensorboard-data-server

0.6.1

tensorboard-plugin-profile

2.5.0

tensorboard-plugin-wit

1.8.1

tensorflow

2.8.0

tensorflow-estimator

2.8.0

tensorflow-io-gcs-filesystem

0.24.0

termcolor

1.1.0

terminado

0.9.4

testpath

0.4.4

tf-estimator-nightly

2.8.0.dev2021122109

thinc

8.0.12

threadpoolctl

2.1.0

tokenizers

0.10.3

torch

1.10.2+cu111

torchvision

0.11.3+cu111

tornado

6.1

tqdm

4.59.0

traitlets

5.0.5

transformers

4.16.2

typer

0.3.2

typing-extensions

3.7.4.3

ujson

4.0.2

unattended-upgrades

0.1

urllib3

1.25.11

virtualenv

20.4.1

visions

0.7.4

wasabi

0.8.2

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.57.0

Werkzeug

1.0.1

wheel

0.36.2

widgetsnbextension

3.5.1

wrapt

1.12.1

xgboost

1.5.2

zipp

3.4.1

Spark packages containing Python modules

Spark Package

Python Module

Version

graphframes

graphframes

0.8.2-db1-spark3.2

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 10.4 LTS.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 10.4 LTS, Databricks Runtime 10.4 LTS ML contains the following JARs:

CPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

0.18.1-23eb1ef

ml.dmlc

xgboost4j-spark_2.12

1.5.2

ml.dmlc

xgboost4j_2.12

1.5.2

org.graphframes

graphframes_2.12

0.8.2-db1-spark3.2

org.mlflow

mlflow-client

1.24.0

org.mlflow

mlflow-spark

1.24.0

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0

GPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

0.18.1-23eb1ef

ml.dmlc

xgboost4j-spark_2.12

1.5.2

ml.dmlc

xgboost4j_2.12

1.5.2

org.graphframes

graphframes_2.12

0.8.2-db1-spark3.2

org.mlflow

mlflow-client

1.24.0

org.mlflow

mlflow-spark

1.24.0

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0