Databricks Runtime 14.1 for Machine Learning

Databricks Runtime 14.1 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 14.1. Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod.

Tip

To see release notes for Databricks Runtime versions that have reached end-of-support (EoS), see End-of-support Databricks Runtime release notes. The EoS Databricks Runtime versions have been retired and might not be updated.

New features and improvements

Databricks Runtime 14.1 ML is built on top of Databricks Runtime 14.1. For information on what’s new in Databricks Runtime 14.1, including Apache Spark MLlib and SparkR, see the Databricks Runtime 14.1 release notes.

Enhancements to AutoML

AutoML generated notebooks are now saved as MLflow artifacts.

Enhancements to Databricks Feature Store

You can now automatically infer and log an input example when you log a model. To do this, set infer_model_example to True when you call log_model. The example is based on the training data specified in the training_set parameter.

For more information about Databricks Feature Store, see Feature engineering and serving.

System environment

The system environment in Databricks Runtime 14.1 ML differs from Databricks Runtime 14.1 as follows:

Databricks Runtime 14.1 ML includes XGBoost 1.7.6, which does not support GPU clusters with compute capability 5.2 and below.

Libraries

The following sections list the libraries included in Databricks Runtime 14.1 ML that differ from those included in Databricks Runtime 14.1.

Python libraries

Databricks Runtime 14.1 ML uses Virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the following sections, Databricks Runtime 14.1 ML also includes the following packages:

  • hyperopt 0.2.7+db4

  • sparkdl 3.0.0_db1

  • automl 1.22.0

To reproduce the Databricks Runtime ML Python environment in your local Python virtual environment, download the requirements-14.1.txt file and run pip install -r requirements-14.1.txt. This command installs all of the open source libraries that Databricks Runtime ML uses, but does not install libraries developed by Databricks, such as databricks-automl, databricks-feature-store, or the Databricks fork of hyperopt.

Python libraries on CPU clusters

Library

Version

Library

Version

Library

Version

absl-py

1.0.0

accelerate

0.21.0

aiohttp

3.8.5

aiosignal

1.3.1

anyio

3.5.0

appdirs

1.4.4

argon2-cffi

21.3.0

argon2-cffi-bindings

21.2.0

astor

0.8.1

asttokens

2.0.5

astunparse

1.6.3

async-timeout

4.0.3

attrs

22.1.0

audioread

3.0.0

azure-core

1.29.1

azure-cosmos

4.3.1

azure-storage-blob

12.18.1

azure-storage-file-datalake

12.13.1

backcall

0.2.0

bcrypt

3.2.0

beautifulsoup4

4.11.1

black

22.6.0

bleach

4.1.0

blinker

1.4

blis

0.7.10

boto3

1.24.28

botocore

1.27.96

cachetools

5.3.1

catalogue

2.0.9

category-encoders

2.6.2

certifi

2022.12.7

cffi

1.15.1

chardet

4.0.0

charset-normalizer

2.0.4

click

8.0.4

cloudpickle

2.0.0

cmdstanpy

1.1.0

comm

0.1.2

confection

0.1.3

configparser

5.2.0

contourpy

1.0.5

convertdate

2.4.0

cryptography

39.0.1

cycler

0.11.0

cymem

2.0.8

Cython

0.29.32

dacite

1.8.1

databricks-automl-runtime

0.2.19

databricks-cli

0.17.7

databricks-feature-store

0.15.1

databricks-sdk

0.1.6

dataclasses-json

0.5.14

datasets

2.14.4

dbl-tempo

0.1.23

dbus-python

1.2.18

debugpy

1.6.7

decorator

5.1.1

deepspeed

0.10.0

defusedxml

0.7.1

dill

0.3.6

diskcache

5.6.3

distlib

0.3.7

docstring-to-markdown

0.11

entrypoints

0.4

ephem

4.1.4

evaluate

0.4.0

executing

0.8.3

facets-overview

1.1.1

fastapi

0.98.0

fastjsonschema

2.18.0

fasttext

0.9.2

filelock

3.9.0

Flask

2.2.5

flatbuffers

23.5.26

fonttools

4.25.0

frozenlist

1.4.0

fsspec

2022.11.0

future

0.18.3

gast

0.4.0

GCC runtime library

1.10.0

gitdb

4.0.10

GitPython

3.1.27

google-api-core

2.11.1

google-auth

2.21.0

google-auth-oauthlib

1.0.0

google-cloud-core

2.3.3

google-cloud-storage

2.10.0

google-crc32c

1.5.0

google-pasta

0.2.0

google-resumable-media

2.6.0

googleapis-common-protos

1.60.0

greenlet

2.0.1

grpcio

1.48.2

grpcio-status

1.48.1

gunicorn

20.1.0

gviz-api

1.10.0

h11

0.14.0

h5py

3.7.0

hjson

3.1.0

holidays

0.30

horovod

0.28.1

htmlmin

0.1.12

httplib2

0.20.2

httptools

0.6.0

huggingface-hub

0.14.1

idna

3.4

ImageHash

4.3.1

imbalanced-learn

0.10.1

importlib-metadata

4.11.3

importlib-resources

6.0.1

ipykernel

6.25.0

ipython

8.14.0

ipython-genutils

0.2.0

ipywidgets

7.7.2

isodate

0.6.1

itsdangerous

2.0.1

jedi

0.18.1

jeepney

0.7.1

Jinja2

3.1.2

jmespath

0.10.0

joblib

1.2.0

joblibspark

0.5.1

jsonschema

4.17.3

jupyter-client

7.3.4

jupyter-server

1.23.4

jupyter_core

5.2.0

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keras

2.13.1

keyring

23.5.0

kiwisolver

1.4.4

langchain

0.0.267

langcodes

3.3.0

langsmith

0.0.38

launchpadlib

1.10.16

lazr.restfulclient

0.14.4

lazr.uri

1.0.6

lazy_loader

0.3

libclang

15.0.6.1

librosa

0.10.1

lightgbm

4.0.0

llvmlite

0.39.1

LunarCalendar

0.0.9

lxml

4.9.1

Mako

1.2.0

Markdown

3.4.1

MarkupSafe

2.1.1

marshmallow

3.20.1

matplotlib

3.7.0

matplotlib-inline

0.1.6

mccabe

0.7.0

mistune

0.8.4

mlflow-skinny

2.7.1

more-itertools

8.10.0

mpmath

1.2.1

msgpack

1.0.5

multidict

6.0.4

multimethod

1.9.1

multiprocess

0.70.14

murmurhash

1.0.10

mypy-extensions

0.4.3

nbclassic

0.5.2

nbclient

0.5.13

nbconvert

6.5.4

nbformat

5.7.0

nest-asyncio

1.5.6

networkx

2.8.4

ninja

1.11.1

nltk

3.7

nodeenv

1.8.0

notebook

6.5.2

notebook_shim

0.2.2

numba

0.56.4

numexpr

2.8.4

numpy

1.23.5

oauthlib

3.2.0

openai

0.27.8

openapi-schema-pydantic

1.2.4

opt-einsum

3.3.0

packaging

22.0

pandas

1.5.3

pandocfilters

1.5.0

paramiko

2.9.2

parso

0.8.3

pathspec

0.10.3

pathy

0.10.2

patsy

0.5.3

petastorm

0.12.1

pexpect

4.8.0

phik

0.12.3

pickleshare

0.7.5

Pillow

9.4.0

pip

22.3.1

platformdirs

2.5.2

plotly

5.9.0

pluggy

1.0.0

pmdarima

2.0.3

pooch

1.4.0

preshed

3.0.9

prometheus-client

0.14.1

prompt-toolkit

3.0.36

prophet

1.1.4

protobuf

4.24.0

psutil

5.9.0

psycopg2

2.9.3

ptyprocess

0.7.0

pure-eval

0.2.2

py-cpuinfo

9.0.0

pyarrow

8.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.11.1

pycparser

2.21

pydantic

1.10.6

pyflakes

3.0.1

Pygments

2.11.2

PyGObject

3.42.1

PyJWT

2.3.0

PyMeeus

0.5.12

PyNaCl

1.5.0

pyodbc

4.0.32

pyparsing

3.0.9

pyright

1.1.294

pyrsistent

0.18.0

pytesseract

0.3.10

python-dateutil

2.8.2

python-dotenv

1.0.0

python-editor

1.0.4

python-lsp-jsonrpc

1.0.0

python-lsp-server

1.7.1

pytoolconfig

1.2.5

pytz

2022.7

PyWavelets

1.4.1

PyYAML

6.0

pyzmq

23.2.0

regex

2022.7.9

requests

2.28.1

requests-oauthlib

1.3.1

responses

0.18.0

rope

1.7.0

rsa

4.9

s3transfer

0.6.2

safetensors

0.3.3

scikit-learn

1.1.1

seaborn

0.12.2

SecretStorage

3.3.1

Send2Trash

1.8.0

sentence-transformers

2.2.2

sentencepiece

0.1.99

setuptools

65.6.3

shap

0.42.1

simplejson

3.17.6

six

1.16.0

slicer

0.0.7

smart-open

5.2.1

smmap

5.0.0

sniffio

1.2.0

soundfile

0.12.1

soupsieve

2.3.2.post1

soxr

0.3.6

spacy

3.6.1

spacy-legacy

3.0.12

spacy-loggers

1.0.5

spark-tensorflow-distributor

1.0.0

SQLAlchemy

1.4.39

sqlparse

0.4.2

srsly

2.4.7

ssh-import-id

5.11

stack-data

0.2.0

starlette

0.27.0

statsmodels

0.13.5

sympy

1.11.1

tabulate

0.8.10

tangled-up-in-unicode

0.2.0

tenacity

8.1.0

tensorboard

2.13.0

tensorboard-data-server

0.7.1

tensorboard-plugin-profile

2.13.1

tensorflow-cpu

2.13.0

tensorflow-estimator

2.13.0

tensorflow-io-gcs-filesystem

0.34.0

termcolor

2.3.0

terminado

0.17.1

thinc

8.1.12

threadpoolctl

2.2.0

tiktoken

0.4.0

tinycss2

1.2.1

tokenize-rt

4.2.1

tokenizers

0.13.3

tomli

2.0.1

torch

2.0.1+cpu

torchvision

0.15.2+cpu

tornado

6.1

tqdm

4.64.1

traitlets

5.7.1

transformers

4.31.0

typeguard

2.13.3

typer

0.9.0

typing-inspect

0.9.0

typing_extensions

4.4.0

ujson

5.4.0

unattended-upgrades

0.1

urllib3

1.26.14

uvicorn

0.23.2

uvloop

0.17.0

virtualenv

20.16.7

visions

0.7.5

wadllib

1.3.6

wasabi

1.1.2

watchfiles

0.20.0

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.58.0

websockets

11.0.3

Werkzeug

2.2.2

whatthepatch

1.0.2

wheel

0.38.4

widgetsnbextension

3.6.1

wordcloud

1.9.2

wrapt

1.14.1

xgboost

1.7.6

xxhash

3.3.0

yapf

0.31.0

yarl

1.9.2

ydata-profiling

4.2.0

zipp

3.11.0

Python libraries on GPU clusters

Library

Version

Library

Version

Library

Version

absl-py

1.0.0

accelerate

0.21.0

aiohttp

3.8.5

aiosignal

1.3.1

anyio

3.5.0

appdirs

1.4.4

argon2-cffi

21.3.0

argon2-cffi-bindings

21.2.0

astor

0.8.1

asttokens

2.0.5

astunparse

1.6.3

async-timeout

4.0.3

attrs

22.1.0

audioread

3.0.0

azure-core

1.29.1

azure-cosmos

4.3.1

azure-storage-blob

12.18.1

azure-storage-file-datalake

12.13.1

backcall

0.2.0

bcrypt

3.2.0

beautifulsoup4

4.11.1

black

22.6.0

bleach

4.1.0

blinker

1.4

blis

0.7.10

boto3

1.24.28

botocore

1.27.96

cachetools

5.3.1

catalogue

2.0.9

category-encoders

2.6.2

certifi

2022.12.7

cffi

1.15.1

chardet

4.0.0

charset-normalizer

2.0.4

click

8.0.4

cloudpickle

2.0.0

cmake

3.27.5

cmdstanpy

1.1.0

comm

0.1.2

confection

0.1.3

configparser

5.2.0

contourpy

1.0.5

convertdate

2.4.0

cryptography

39.0.1

cycler

0.11.0

cymem

2.0.8

Cython

0.29.32

dacite

1.8.1

databricks-automl-runtime

0.2.19

databricks-cli

0.17.7

databricks-feature-store

0.15.1

databricks-sdk

0.1.6

dataclasses-json

0.5.14

datasets

2.14.4

dbl-tempo

0.1.23

dbus-python

1.2.18

debugpy

1.6.7

decorator

5.1.1

deepspeed

0.10.0

defusedxml

0.7.1

dill

0.3.6

diskcache

5.6.3

distlib

0.3.7

docstring-to-markdown

0.11

einops

0.6.1

entrypoints

0.4

ephem

4.1.4

evaluate

0.4.0

executing

0.8.3

facets-overview

1.1.1

fastapi

0.98.0

fastjsonschema

2.18.0

fasttext

0.9.2

filelock

3.9.0

flash-attn

2.0.8

Flask

2.2.5

flatbuffers

23.5.26

fonttools

4.25.0

frozenlist

1.4.0

fsspec

2022.11.0

future

0.18.3

gast

0.4.0

GCC runtime library

1.10.0

gitdb

4.0.10

GitPython

3.1.27

google-api-core

2.11.1

google-auth

2.21.0

google-auth-oauthlib

1.0.0

google-cloud-core

2.3.3

google-cloud-storage

2.10.0

google-crc32c

1.5.0

google-pasta

0.2.0

google-resumable-media

2.6.0

googleapis-common-protos

1.60.0

greenlet

2.0.1

grpcio

1.48.2

grpcio-status

1.48.1

gunicorn

20.1.0

gviz-api

1.10.0

h11

0.14.0

h5py

3.7.0

hjson

3.1.0

holidays

0.30

horovod

0.28.1

htmlmin

0.1.12

httplib2

0.20.2

httptools

0.6.0

huggingface-hub

0.14.1

idna

3.4

ImageHash

4.3.1

imbalanced-learn

0.10.1

importlib-metadata

4.11.3

importlib-resources

6.0.1

ipykernel

6.25.0

ipython

8.14.0

ipython-genutils

0.2.0

ipywidgets

7.7.2

isodate

0.6.1

itsdangerous

2.0.1

jedi

0.18.1

jeepney

0.7.1

Jinja2

3.1.2

jmespath

0.10.0

joblib

1.2.0

joblibspark

0.5.1

jsonschema

4.17.3

jupyter-client

7.3.4

jupyter-server

1.23.4

jupyter_core

5.2.0

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keras

2.13.1

keyring

23.5.0

kiwisolver

1.4.4

langchain

0.0.267

langcodes

3.3.0

langsmith

0.0.38

launchpadlib

1.10.16

lazr.restfulclient

0.14.4

lazr.uri

1.0.6

lazy_loader

0.3

libclang

15.0.6.1

librosa

0.10.1

lightgbm

4.0.0

lit

16.0.6

llvmlite

0.39.1

LunarCalendar

0.0.9

lxml

4.9.1

Mako

1.2.0

Markdown

3.4.1

MarkupSafe

2.1.1

marshmallow

3.20.1

matplotlib

3.7.0

matplotlib-inline

0.1.6

mccabe

0.7.0

mistune

0.8.4

mlflow-skinny

2.7.1

more-itertools

8.10.0

mpmath

1.2.1

msgpack

1.0.5

multidict

6.0.4

multimethod

1.9.1

multiprocess

0.70.14

murmurhash

1.0.10

mypy-extensions

0.4.3

nbclassic

0.5.2

nbclient

0.5.13

nbconvert

6.5.4

nbformat

5.7.0

nest-asyncio

1.5.6

networkx

2.8.4

ninja

1.11.1

nltk

3.7

nodeenv

1.8.0

notebook

6.5.2

notebook_shim

0.2.2

numba

0.56.4

numexpr

2.8.4

numpy

1.23.5

oauthlib

3.2.0

openai

0.27.8

openapi-schema-pydantic

1.2.4

opt-einsum

3.3.0

packaging

22.0

pandas

1.5.3

pandocfilters

1.5.0

paramiko

2.9.2

parso

0.8.3

pathspec

0.10.3

pathy

0.10.2

patsy

0.5.3

petastorm

0.12.1

pexpect

4.8.0

phik

0.12.3

pickleshare

0.7.5

Pillow

9.4.0

pip

22.3.1

platformdirs

2.5.2

plotly

5.9.0

pluggy

1.0.0

pmdarima

2.0.3

pooch

1.4.0

preshed

3.0.9

prompt-toolkit

3.0.36

prophet

1.1.4

protobuf

4.24.0

psutil

5.9.0

psycopg2

2.9.3

ptyprocess

0.7.0

pure-eval

0.2.2

py-cpuinfo

9.0.0

pyarrow

8.0.0

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.11.1

pycparser

2.21

pydantic

1.10.6

pyflakes

3.0.1

Pygments

2.11.2

PyGObject

3.42.1

PyJWT

2.3.0

PyMeeus

0.5.12

PyNaCl

1.5.0

pyodbc

4.0.32

pyparsing

3.0.9

pyright

1.1.294

pyrsistent

0.18.0

pytesseract

0.3.10

python-dateutil

2.8.2

python-dotenv

1.0.0

python-editor

1.0.4

python-lsp-jsonrpc

1.0.0

python-lsp-server

1.7.1

pytoolconfig

1.2.5

pytz

2022.7

PyWavelets

1.4.1

PyYAML

6.0

pyzmq

23.2.0

regex

2022.7.9

requests

2.28.1

requests-oauthlib

1.3.1

responses

0.18.0

rope

1.7.0

rsa

4.9

s3transfer

0.6.2

safetensors

0.3.3

scikit-learn

1.1.1

seaborn

0.12.2

SecretStorage

3.3.1

Send2Trash

1.8.0

sentence-transformers

2.2.2

sentencepiece

0.1.99

setuptools

65.6.3

shap

0.42.1

simplejson

3.17.6

six

1.16.0

slicer

0.0.7

smart-open

5.2.1

smmap

5.0.0

sniffio

1.2.0

soundfile

0.12.1

soupsieve

2.3.2.post1

soxr

0.3.6

spacy

3.6.1

spacy-legacy

3.0.12

spacy-loggers

1.0.5

spark-tensorflow-distributor

1.0.0

SQLAlchemy

1.4.39

sqlparse

0.4.2

srsly

2.4.7

ssh-import-id

5.11

stack-data

0.2.0

starlette

0.27.0

statsmodels

0.13.5

sympy

1.11.1

tabulate

0.8.10

tangled-up-in-unicode

0.2.0

tenacity

8.1.0

tensorboard

2.13.0

tensorboard-data-server

0.7.1

tensorboard-plugin-profile

2.13.1

tensorflow

2.13.0

tensorflow-estimator

2.13.0

tensorflow-io-gcs-filesystem

0.34.0

termcolor

2.3.0

terminado

0.17.1

thinc

8.1.12

threadpoolctl

2.2.0

tiktoken

0.4.0

tinycss2

1.2.1

tokenize-rt

4.2.1

tokenizers

0.13.3

tomli

2.0.1

torch

2.0.1+cu118

torchvision

0.15.2+cu118

tornado

6.1

tqdm

4.64.1

traitlets

5.7.1

transformers

4.31.0

triton

2.0.0

typeguard

2.13.3

typer

0.9.0

typing-inspect

0.9.0

typing_extensions

4.4.0

ujson

5.4.0

unattended-upgrades

0.1

urllib3

1.26.14

uvicorn

0.23.2

uvloop

0.17.0

virtualenv

20.16.7

visions

0.7.5

wadllib

1.3.6

wasabi

1.1.2

watchfiles

0.20.0

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.58.0

websockets

11.0.3

Werkzeug

2.2.2

whatthepatch

1.0.2

wheel

0.38.4

widgetsnbextension

3.6.1

wordcloud

1.9.2

wrapt

1.14.1

xgboost

1.7.6

xxhash

3.3.0

yapf

0.31.0

yarl

1.9.2

ydata-profiling

4.2.0

zipp

3.11.0

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 14.1.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 14.1, Databricks Runtime 14.1 ML contains the following JARs:

CPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.dmlc

xgboost4j-spark_2.12

1.7.3

ml.dmlc

xgboost4j_2.12

1.7.3

org.graphframes

graphframes_2.12

0.8.2-db2-spark3.4

org.mlflow

mlflow-client

2.7.1

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0

GPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.dmlc

xgboost4j-gpu_2.12

1.7.3

ml.dmlc

xgboost4j-spark-gpu_2.12

1.7.3

org.graphframes

graphframes_2.12

0.8.2-db2-spark3.4

org.mlflow

mlflow-client

2.7.1

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0