Databricks Runtime 15.0 for Machine Learning (EoS)

Note

Support for this Databricks Runtime version has ended. For the end-of-support date, see End-of-support history. For all supported Databricks Runtime versions, see Databricks Runtime release notes versions and compatibility.

Databricks Runtime 15.0 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 15.0 (EoS). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod.

New features and improvements

Databricks Runtime 15.0 ML is built on top of Databricks Runtime 15.0. For information on what’s new in Databricks Runtime 15.0, including Apache Spark MLlib and SparkR, see the Databricks Runtime 15.0 (EoS) release notes.

Breaking changes

Legacy Databricks CLI is no longer installed by default

In Databricks Runtime 14.3 LTS ML and below, because the preinstalled version of MLflow required the legacy Databricks CLI (databricks/databricks-cli), it was automatically installed in $PATH. Databricks Runtime 15.0 ML includes MLflow version 2.10.2, which does not require the legacy CLI.

Starting with Databricks Runtime 15.0 ML, the legacy Databricks CLI is no longer automatically installed in $PATH. This is a breaking change for users who depend on the legacy CLI being installed in the runtime. Commands like %sh databricks ... no longer work in Databricks Runtime 15.0 ML and above.

To continue using the legacy Databricks CLI from a notebook, install it as a cluster or notebook library. The new Databricks CLI (databricks/cli) is available from the web terminal. For more information, see Use web terminal and Databricks CLI.

MLeap is no longer available starting in Databricks Runtime 15.0 ML

MLeap is no longer available in Databricks Runtime 15.0 ML and above. To package models for deployment onto JVM-based frameworks, Databricks recommends using the ONNX format.

Deprecation of Horovod and HorovodRunner

Horovod and HorovodRunner are now deprecated. For distributed deep learning, Databricks recommends using TorchDistributor for distributed training with PyTorch or the tf.distribute.Strategy API for distributed training with TensorFlow. Horovod and HorovodRunner are preinstalled in Databricks Runtime 15.0 ML, but will be removed in the next major Databricks Runtime ML version.

Note

horovod.spark does not support pyarrow versions 11.0 and above (see the relevant GitHub Issue). Databricks Runtime 15.0 ML includes pyarrow version 14.0.1. To use horovod.spark with Databricks Runtime 15.0 ML or above, you must manually install pyarrow, specifying a version below 11.0.

System environment

The system environment in Databricks Runtime 15.0 ML differs from Databricks Runtime 15.0 as follows:

  • For GPU clusters, Databricks Runtime ML includes the following NVIDIA GPU libraries:

    • CUDA 12.1

    • cuDNN 8.9.0.131-1

    • NCCL 2.17.1

    • TensorRT 8.6.1.6-1

Libraries

The following sections list the libraries included in Databricks Runtime 15.0 ML that differ from those included in Databricks Runtime 15.0.

Python libraries

Databricks Runtime 15.0 ML uses virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the following sections, Databricks Runtime 15.0 ML also includes the following packages:

  • hyperopt 0.2.7+db4

  • sparkdl 3.0.0_db1

  • automl 1.25.0

To reproduce the Databricks Runtime ML Python environment in your local Python virtual environment, download the requirements-15.0.txt file and run pip install -r requirements-15.0.txt. This command installs all of the open source libraries that Databricks Runtime ML uses, but does not install libraries developed by Databricks, such as databricks-automl, databricks-feature-store, or the Databricks fork of hyperopt.

Python libraries on CPU clusters

Library

Version

Library

Version

Library

Version

absl-py

1.0.0

accelerate

0.25.0

aiohttp

3.8.5

aiohttp-cors

0.7.0

aiosignal

1.2.0

anyio

3.5.0

argon2-cffi

21.3.0

argon2-cffi-bindings

21.2.0

astor

0.8.1

asttokens

2.0.5

astunparse

1.6.3

async-timeout

4.0.2

attrs

22.1.0

audioread

3.0.1

azure-core

1.30.1

azure-cosmos

4.3.1

azure-storage-blob

12.19.0

azure-storage-file-datalake

12.14.0

backcall

0.2.0

bcrypt

3.2.0

beautifulsoup4

4.12.2

black

23.3.0

bleach

4.1.0

blessed

1.20.0

blinker

1.4

blis

0.7.11

boto3

1.34.39

botocore

1.34.39

cachetools

5.3.3

catalogue

2.0.10

category-encoders

2.6.3

certifi

2023.7.22

cffi

1.15.1

chardet

4.0.0

charset-normalizer

2.0.4

click

8.0.4

cloudpathlib

0.16.0

cloudpickle

2.2.1

cmdstanpy

1.2.1

colorful

0.5.6

comm

0.1.2

confection

0.1.4

configparser

5.2.0

contourpy

1.0.5

cryptography

41.0.3

cycler

0.11.0

cymem

2.0.8

Cython

0.29.32

dacite

1.8.1

databricks-automl-runtime

0.2.21

databricks-feature-engineering

0.3.0

databricks-sdk

0.20.0

dataclasses-json

0.6.4

datasets

2.16.1

dbl-tempo

0.1.26

dbus-python

1.2.18

debugpy

1.6.7

decorator

5.1.1

deepspeed

0.13.1

defusedxml

0.7.1

dill

0.3.6

diskcache

5.6.3

distlib

0.3.8

dm-tree

0.1.8

entrypoints

0.4

evaluate

0.4.1

executing

0.8.3

facets-overview

1.1.1

Farama-Notifications

0.0.4

fastjsonschema

2.19.1

fasttext

0.9.2

filelock

3.9.0

Flask

2.2.5

flatbuffers

23.5.26

fonttools

4.25.0

frozenlist

1.3.3

fsspec

2023.5.0

future

0.18.3

gast

0.4.0

gitdb

4.0.11

GitPython

3.1.27

google-api-core

2.17.1

google-auth

2.21.0

google-auth-oauthlib

1.0.0

google-cloud-core

2.4.1

google-cloud-storage

2.11.0

google-crc32c

1.5.0

google-pasta

0.2.0

google-resumable-media

2.7.0

googleapis-common-protos

1.62.0

gpustat

1.1.1

greenlet

2.0.1

grpcio

1.60.0

grpcio-status

1.60.0

gunicorn

20.1.0

gviz-api

1.10.0

gymnasium

0.28.1

h11

0.14.0

h5py

3.9.0

hjson

3.1.0

holidays

0.38

horovod

0.28.1+db1

htmlmin

0.1.12

httpcore

1.0.4

httplib2

0.20.2

httpx

0.27.0

huggingface-hub

0.20.2

idna

3.4

ImageHash

4.3.1

imageio

2.31.1

imbalanced-learn

0.11.0

importlib-metadata

6.0.0

importlib_resources

6.1.2

ipyflow-core

0.0.198

ipykernel

6.25.1

ipython

8.15.0

ipython-genutils

0.2.0

ipywidgets

8.0.4

isodate

0.6.1

itsdangerous

2.0.1

jax-jumpy

1.0.0

jedi

0.18.1

jeepney

0.7.1

Jinja2

3.1.2

jmespath

0.10.0

joblib

1.2.0

joblibspark

0.5.1

jsonpatch

1.33

jsonpointer

2.4

jsonschema

4.17.3

jupyter-server

1.23.4

jupyter_client

7.4.9

jupyter_core

5.3.0

jupyterlab-pygments

0.1.2

jupyterlab-widgets

3.0.5

keras

2.15.0

keyring

23.5.0

kiwisolver

1.4.4

langchain

0.1.3

langchain-community

0.0.20

langchain-core

0.1.23

langcodes

3.3.0

langsmith

0.0.87

launchpadlib

1.10.16

lazr.restfulclient

0.14.4

lazr.uri

1.0.6

lazy_loader

0.2

libclang

16.0.6

librosa

0.10.1

lightgbm

4.2.0

llvmlite

0.40.0

lxml

4.9.2

lz4

4.3.2

Mako

1.2.0

Markdown

3.4.1

markdown-it-py

2.2.0

MarkupSafe

2.1.1

marshmallow

3.21.1

matplotlib

3.7.2

matplotlib-inline

0.1.6

mdurl

0.1.0

mistune

0.8.4

ml-dtypes

0.2.0

mlflow-skinny

2.10.2

more-itertools

8.10.0

mpmath

1.3.0

msgpack

1.0.8

multidict

6.0.2

multimethod

1.11.2

multiprocess

0.70.14

murmurhash

1.0.10

mypy-extensions

0.4.3

nbclassic

0.5.5

nbclient

0.5.13

nbconvert

6.5.4

nbformat

5.7.0

nest-asyncio

1.5.6

networkx

3.1

ninja

1.11.1.1

nltk

3.8.1

notebook

6.5.4

notebook_shim

0.2.2

numba

0.57.1

numpy

1.23.5

nvidia-ml-py

12.535.133

oauthlib

3.2.0

openai

1.9.0

opencensus

0.11.4

opencensus-context

0.1.3

opt-einsum

3.3.0

packaging

23.2

pandas

2.0.3

pandocfilters

1.5.0

paramiko

2.9.2

parso

0.8.3

pathspec

0.10.3

patsy

0.5.3

petastorm

0.12.1

pexpect

4.8.0

phik

0.12.4

pickleshare

0.7.5

Pillow

9.4.0

pip

23.2.1

platformdirs

3.10.0

plotly

5.9.0

pmdarima

2.0.4

pooch

1.8.1

preshed

3.0.9

prometheus-client

0.14.1

prompt-toolkit

3.0.36

prophet

1.1.5

protobuf

4.24.1

psutil

5.9.0

psycopg2

2.9.3

ptyprocess

0.7.0

pure-eval

0.2.2

py-cpuinfo

8.0.0

py-spy

0.3.14

pyarrow

14.0.1

pyarrow-hotfix

0.6

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.11.1

pyccolo

0.0.52

pycparser

2.21

pydantic

1.10.6

Pygments

2.15.1

PyGObject

3.42.1

PyJWT

2.3.0

PyNaCl

1.5.0

pynvml

11.5.0

pyodbc

4.0.38

pyparsing

3.0.9

pyrsistent

0.18.0

pytesseract

0.3.10

python-dateutil

2.8.2

python-editor

1.0.4

python-lsp-jsonrpc

1.1.1

pytz

2022.7

PyWavelets

1.4.1

PyYAML

6.0

pyzmq

23.2.0

ray

2.9.3

regex

2022.7.9

requests

2.31.0

requests-oauthlib

1.3.1

responses

0.13.3

rich

13.7.1

rsa

4.9

s3transfer

0.10.0

safetensors

0.3.2

scikit-image

0.20.0

scikit-learn

1.3.0

scipy

1.11.1

seaborn

0.12.2

SecretStorage

3.3.1

Send2Trash

1.8.0

sentence-transformers

2.2.2

sentencepiece

0.1.99

setuptools

68.0.0

shap

0.44.0

simplejson

3.17.6

six

1.16.0

slicer

0.0.7

smart-open

5.2.1

smmap

5.0.0

sniffio

1.2.0

soundfile

0.12.1

soupsieve

2.4

soxr

0.3.7

spacy

3.7.2

spacy-legacy

3.0.12

spacy-loggers

1.0.5

spark-tensorflow-distributor

1.0.0

SQLAlchemy

1.4.39

sqlparse

0.4.2

srsly

2.4.8

ssh-import-id

5.11

stack-data

0.2.0

stanio

0.3.0

statsmodels

0.14.0

sympy

1.11.1

tangled-up-in-unicode

0.2.0

tenacity

8.2.2

tensorboard

2.15.1

tensorboard-data-server

0.7.2

tensorboard-plugin-profile

2.15.0

tensorboardX

2.6.2.2

tensorflow-cpu

2.15.0

tensorflow-estimator

2.15.0

tensorflow-io-gcs-filesystem

0.36.0

termcolor

2.4.0

terminado

0.17.1

thinc

8.2.3

threadpoolctl

2.2.0

tifffile

2021.7.2

tiktoken

0.5.2

tinycss2

1.2.1

tokenize-rt

4.2.1

tokenizers

0.15.0

torch

2.1.2+cpu

torcheval

0.0.7

torchvision

0.16.2+cpu

tornado

6.3.2

tqdm

4.65.0

traitlets

5.7.1

transformers

4.36.2

typeguard

2.13.3

typer

0.9.0

typing-inspect

0.9.0

typing_extensions

4.7.1

tzdata

2022.1

ujson

5.4.0

unattended-upgrades

0.1

urllib3

1.26.16

virtualenv

20.21.0

visions

0.7.5

wadllib

1.3.6

wasabi

1.1.2

wcwidth

0.2.5

weasel

0.3.4

webencodings

0.5.1

websocket-client

0.58.0

Werkzeug

2.2.3

wheel

0.38.4

widgetsnbextension

4.0.5

wordcloud

1.9.3

wrapt

1.14.1

xgboost

2.0.3

xxhash

3.4.1

yarl

1.8.1

ydata-profiling

4.5.1

zipp

3.11.0

Python libraries on GPU clusters

Library

Version

Library

Version

Library

Version

absl-py

1.0.0

accelerate

0.25.0

aiohttp

3.8.5

aiohttp-cors

0.7.0

aiosignal

1.2.0

anyio

3.5.0

argon2-cffi

21.3.0

argon2-cffi-bindings

21.2.0

astor

0.8.1

asttokens

2.0.5

astunparse

1.6.3

async-timeout

4.0.2

attrs

22.1.0

audioread

3.0.1

azure-core

1.30.1

azure-cosmos

4.3.1

azure-storage-blob

12.19.0

azure-storage-file-datalake

12.14.0

backcall

0.2.0

bcrypt

3.2.0

beautifulsoup4

4.12.2

black

23.3.0

bleach

4.1.0

blessed

1.20.0

blinker

1.4

blis

0.7.11

boto3

1.34.39

botocore

1.34.39

cachetools

5.3.3

catalogue

2.0.10

category-encoders

2.6.3

certifi

2023.7.22

cffi

1.15.1

chardet

4.0.0

charset-normalizer

2.0.4

click

8.0.4

cloudpathlib

0.16.0

cloudpickle

2.2.1

cmdstanpy

1.2.1

colorful

0.5.6

comm

0.1.2

confection

0.1.4

configparser

5.2.0

contourpy

1.0.5

cryptography

41.0.3

cycler

0.11.0

cymem

2.0.8

Cython

0.29.32

dacite

1.8.1

databricks-automl-runtime

0.2.21

databricks-feature-engineering

0.3.0

databricks-sdk

0.20.0

dataclasses-json

0.6.4

datasets

2.16.1

dbl-tempo

0.1.26

dbus-python

1.2.18

debugpy

1.6.7

decorator

5.1.1

deepspeed

0.13.1

defusedxml

0.7.1

dill

0.3.6

diskcache

5.6.3

distlib

0.3.8

dm-tree

0.1.8

einops

0.7.0

entrypoints

0.4

evaluate

0.4.1

executing

0.8.3

facets-overview

1.1.1

Farama-Notifications

0.0.4

fastjsonschema

2.19.1

fasttext

0.9.2

filelock

3.9.0

flash-attn

2.5.0

Flask

2.2.5

flatbuffers

23.5.26

fonttools

4.25.0

frozenlist

1.3.3

fsspec

2023.5.0

future

0.18.3

gast

0.4.0

gitdb

4.0.11

GitPython

3.1.27

google-api-core

2.17.1

google-auth

2.21.0

google-auth-oauthlib

1.0.0

google-cloud-core

2.4.1

google-cloud-storage

2.11.0

google-crc32c

1.5.0

google-pasta

0.2.0

google-resumable-media

2.7.0

googleapis-common-protos

1.62.0

gpustat

1.1.1

greenlet

2.0.1

grpcio

1.60.0

grpcio-status

1.60.0

gunicorn

20.1.0

gviz-api

1.10.0

gymnasium

0.28.1

h11

0.14.0

h5py

3.9.0

hjson

3.1.0

holidays

0.38

horovod

0.28.1+db1

htmlmin

0.1.12

httpcore

1.0.4

httplib2

0.20.2

httpx

0.27.0

huggingface-hub

0.20.2

idna

3.4

ImageHash

4.3.1

imageio

2.31.1

imbalanced-learn

0.11.0

importlib-metadata

6.0.0

importlib_resources

6.1.2

ipyflow-core

0.0.198

ipykernel

6.25.1

ipython

8.15.0

ipython-genutils

0.2.0

ipywidgets

8.0.4

isodate

0.6.1

itsdangerous

2.0.1

jax-jumpy

1.0.0

jedi

0.18.1

jeepney

0.7.1

Jinja2

3.1.2

jmespath

0.10.0

joblib

1.2.0

joblibspark

0.5.1

jsonpatch

1.33

jsonpointer

2.4

jsonschema

4.17.3

jupyter-server

1.23.4

jupyter_client

7.4.9

jupyter_core

5.3.0

jupyterlab-pygments

0.1.2

jupyterlab-widgets

3.0.5

keras

2.15.0

keyring

23.5.0

kiwisolver

1.4.4

langchain

0.1.3

langchain-community

0.0.20

langchain-core

0.1.23

langcodes

3.3.0

langsmith

0.0.87

launchpadlib

1.10.16

lazr.restfulclient

0.14.4

lazr.uri

1.0.6

lazy_loader

0.2

libclang

16.0.6

librosa

0.10.1

lightgbm

4.2.0

llvmlite

0.40.0

lxml

4.9.2

lz4

4.3.2

Mako

1.2.0

Markdown

3.4.1

markdown-it-py

2.2.0

MarkupSafe

2.1.1

marshmallow

3.21.1

matplotlib

3.7.2

matplotlib-inline

0.1.6

mdurl

0.1.0

mistune

0.8.4

ml-dtypes

0.2.0

mlflow-skinny

2.10.2

more-itertools

8.10.0

mpmath

1.3.0

msgpack

1.0.8

multidict

6.0.2

multimethod

1.11.2

multiprocess

0.70.14

murmurhash

1.0.10

mypy-extensions

0.4.3

nbclassic

0.5.5

nbclient

0.5.13

nbconvert

6.5.4

nbformat

5.7.0

nest-asyncio

1.5.6

networkx

3.1

ninja

1.11.1.1

nltk

3.8.1

notebook

6.5.4

notebook_shim

0.2.2

numba

0.57.1

numpy

1.23.5

nvidia-ml-py

12.535.133

oauthlib

3.2.0

openai

1.9.0

opencensus

0.11.4

opencensus-context

0.1.3

opt-einsum

3.3.0

packaging

23.2

pandas

2.0.3

pandocfilters

1.5.0

paramiko

2.9.2

parso

0.8.3

pathspec

0.10.3

patsy

0.5.3

petastorm

0.12.1

pexpect

4.8.0

phik

0.12.4

pickleshare

0.7.5

Pillow

9.4.0

pip

23.2.1

platformdirs

3.10.0

plotly

5.9.0

pmdarima

2.0.4

pooch

1.8.1

preshed

3.0.9

prompt-toolkit

3.0.36

prophet

1.1.5

protobuf

4.24.1

psutil

5.9.0

psycopg2

2.9.3

ptyprocess

0.7.0

pure-eval

0.2.2

py-cpuinfo

8.0.0

py-spy

0.3.14

pyarrow

14.0.1

pyarrow-hotfix

0.6

pyasn1

0.4.8

pyasn1-modules

0.2.8

pybind11

2.11.1

pyccolo

0.0.52

pycparser

2.21

pydantic

1.10.6

Pygments

2.15.1

PyGObject

3.42.1

PyJWT

2.3.0

PyNaCl

1.5.0

pynvml

11.5.0

pyodbc

4.0.38

pyparsing

3.0.9

pyrsistent

0.18.0

pytesseract

0.3.10

python-dateutil

2.8.2

python-editor

1.0.4

python-lsp-jsonrpc

1.1.1

pytz

2022.7

PyWavelets

1.4.1

PyYAML

6.0

pyzmq

23.2.0

ray

2.9.3

regex

2022.7.9

requests

2.31.0

requests-oauthlib

1.3.1

responses

0.13.3

rich

13.7.1

rsa

4.9

s3transfer

0.10.0

safetensors

0.3.2

scikit-image

0.20.0

scikit-learn

1.3.0

scipy

1.11.1

seaborn

0.12.2

SecretStorage

3.3.1

Send2Trash

1.8.0

sentence-transformers

2.2.2

sentencepiece

0.1.99

setuptools

68.0.0

shap

0.44.0

simplejson

3.17.6

six

1.16.0

slicer

0.0.7

smart-open

5.2.1

smmap

5.0.0

sniffio

1.2.0

soundfile

0.12.1

soupsieve

2.4

soxr

0.3.7

spacy

3.7.2

spacy-legacy

3.0.12

spacy-loggers

1.0.5

spark-tensorflow-distributor

1.0.0

SQLAlchemy

1.4.39

sqlparse

0.4.2

srsly

2.4.8

ssh-import-id

5.11

stack-data

0.2.0

stanio

0.3.0

statsmodels

0.14.0

sympy

1.11.1

tangled-up-in-unicode

0.2.0

tenacity

8.2.2

tensorboard

2.15.1

tensorboard-data-server

0.7.2

tensorboard-plugin-profile

2.15.0

tensorboardX

2.6.2.2

tensorflow

2.15.0

tensorflow-estimator

2.15.0

tensorflow-io-gcs-filesystem

0.36.0

termcolor

2.4.0

terminado

0.17.1

thinc

8.2.3

threadpoolctl

2.2.0

tifffile

2021.7.2

tiktoken

0.5.2

tinycss2

1.2.1

tokenize-rt

4.2.1

tokenizers

0.15.0

torch

2.1.2+cu121

torcheval

0.0.7

torchvision

0.16.2+cu121

tornado

6.3.2

tqdm

4.65.0

traitlets

5.7.1

transformers

4.36.2

triton

2.1.0

typeguard

2.13.3

typer

0.9.0

typing-inspect

0.9.0

typing_extensions

4.7.1

tzdata

2022.1

ujson

5.4.0

unattended-upgrades

0.1

urllib3

1.26.16

virtualenv

20.21.0

visions

0.7.5

wadllib

1.3.6

wasabi

1.1.2

wcwidth

0.2.5

weasel

0.3.4

webencodings

0.5.1

websocket-client

0.58.0

Werkzeug

2.2.3

wheel

0.38.4

widgetsnbextension

4.0.5

wordcloud

1.9.3

wrapt

1.14.1

xgboost

2.0.3

xxhash

3.4.1

yarl

1.8.1

ydata-profiling

4.5.1

zipp

3.11.0

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 15.0.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 15.0, Databricks Runtime 15.0 ML contains the following JARs:

CPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.dmlc

xgboost4j-spark_2.12

1.7.3

ml.dmlc

xgboost4j_2.12

1.7.3

org.graphframes

graphframes_2.12

0.8.2-db2-spark3.4

org.mlflow

mlflow-client

2.10.2

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0

GPU clusters

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.dmlc

xgboost4j-gpu_2.12

1.7.3

ml.dmlc

xgboost4j-spark-gpu_2.12

1.7.3

org.graphframes

graphframes_2.12

0.8.2-db2-spark3.4

org.mlflow

mlflow-client

2.10.2

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0