Databricks Runtime 7.1 for ML (unsupported)

Databricks released this image in July 2020.

Databricks Runtime 7.1 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 7.1 (unsupported). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. It also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see AI and Machine Learning on Databricks.

New features and major changes

Databricks Runtime 7.1 ML is built on top of Databricks Runtime 7.1. For information on what’s new in Databricks Runtime 7.1, including Apache Spark MLlib and SparkR, see the Databricks Runtime 7.1 (unsupported) release notes.

Major changes to Databricks Runtime ML Python environment

This section describes the major changes to the installed Databricks Runtime ML Python environment compared to Databricks Runtime 7.0 ML (unsupported). You should also review the major changes to the Databricks Runtime Python environment in Databricks Runtime 7.1 (unsupported). For a full list of installed Python packages and their versions, see Python libraries.

pip and conda magic commands now enabled by default (Public Preview)

In Databricks Runtime ML 6.4 and above, %pip and %conda magic commands were available via a cluster configuation setting. These commands are now enabled by default. For more information, see Notebook-scoped Python libraries.

Python packages upgraded

  • pillow 7.0.0 -> 7.1.0

  • pytorch 1.5.0 -> 1.5.1

  • torchvision 0.6.0 -> 0.6.1

  • horovod 0.19.1 -> 0.19.5

  • mlflow 1.8.0 -> 1.9.1

Python packages added

  • spark-tensorflow-distributor: 0.1.0

Changes to ML Spark packages, Java and Scala libraries

The following packages are upgraded:

  • mlflow-client: 1.9.1

System environment

The system environment in Databricks Runtime 7.1 ML differs from Databricks Runtime 7.1 as follows:

Libraries

The following sections list the libraries included in Databricks Runtime 7.1 ML that differ from those included in Databricks Runtime 7.1.

Python libraries

Databricks Runtime 7.1 ML uses Conda for Python package management and includes many popular ML packages. The following section describes the Conda environment for Databricks Runtime 7.1 ML.

Python on CPU clusters

name: databricks-ml
channels:
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - absl-py=0.9.0=py37_0
  - asn1crypto=1.3.0=py37_0
  - astor=0.8.0=py37_0
  - backcall=0.1.0=py37_0
  - backports=1.0=py_2
  - bcrypt=3.1.7=py37h7b6447c_1
  - blas=1.0=mkl
  - blinker=1.4=py37_0
  - boto3=1.12.0=py_0
  - botocore=1.15.0=py_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2020.6.24=0
  - cachetools=4.1.0=py_1
  - certifi=2020.6.20=py37_0
  - cffi=1.14.0=py37h2e261b9_0
  - chardet=3.0.4=py37_1003
  - click=7.0=py37_0
  - cloudpickle=1.3.0=py_0
  - configparser=3.7.4=py37_0
  - cpuonly=1.0=0
  - cryptography=2.8=py37h1ba5d50_0
  - cycler=0.10.0=py37_0
  - cython=0.29.15=py37he6710b0_0
  - decorator=4.4.1=py_0
  - dill=0.3.1.1=py37_1
  - docutils=0.15.2=py37_0
  - entrypoints=0.3=py37_0
  - flask=1.1.1=py_1
  - freetype=2.9.1=h8a8886c_1
  - future=0.18.2=py37_1
  - gast=0.3.3=py_0
  - gitdb=4.0.5=py_0
  - gitdb2=4.0.2=py_0
  - gitpython=3.0.5=py_0
  - google-auth=1.11.2=py_0
  - google-auth-oauthlib=0.4.1=py_2
  - google-pasta=0.2.0=py_0
  - grpcio=1.27.2=py37hf8bcb03_0
  - gunicorn=20.0.4=py37_0
  - h5py=2.10.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - icu=58.2=he6710b0_3
  - idna=2.8=py37_0
  - intel-openmp=2020.0=166
  - ipykernel=5.1.4=py37h39e3cac_0
  - ipython=7.12.0=py37h5ca1d4c_0
  - ipython_genutils=0.2.0=py37_0
  - isodate=0.6.0=py_1
  - itsdangerous=1.1.0=py37_0
  - jedi=0.14.1=py37_0
  - jinja2=2.11.1=py_0
  - jmespath=0.9.4=py_0
  - joblib=0.14.1=py_0
  - jpeg=9b=h024ee3a_2
  - jupyter_client=5.3.4=py37_0
  - jupyter_core=4.6.1=py37_0
  - kiwisolver=1.1.0=py37he6710b0_0
  - krb5=1.16.4=h173b8e3_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libpq=11.2=h20c2e04_0
  - libprotobuf=3.11.4=hd408876_0
  - libsodium=1.0.16=h1bed415_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.1.0=h2733197_0
  - lightgbm=2.3.0=py37he6710b0_0
  - lz4-c=1.8.1.2=h14c3975_0
  - mako=1.1.2=py_0
  - markdown=3.1.1=py37_0
  - markupsafe=1.1.1=py37h7b6447c_0
  - matplotlib-base=3.1.3=py37hef1b27d_0
  - mkl=2020.0=166
  - mkl-service=2.3.0=py37he904b0f_0
  - mkl_fft=1.0.15=py37ha843d7b_0
  - mkl_random=1.1.0=py37hd6b4f25_0
  - ncurses=6.2=he6710b0_1
  - networkx=2.4=py_0
  - ninja=1.9.0=py37hfd86e86_0
  - nltk=3.4.5=py37_0
  - numpy=1.18.1=py37h4f9e942_0
  - numpy-base=1.18.1=py37hde5b4d6_1
  - oauthlib=3.1.0=py_0
  - olefile=0.46=py37_0
  - openssl=1.1.1g=h7b6447c_0
  - packaging=20.1=py_0
  - pandas=1.0.1=py37h0573a6f_0
  - paramiko=2.7.1=py_0
  - parso=0.5.2=py_0
  - patsy=0.5.1=py37_0
  - pexpect=4.8.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pillow=7.0.0=py37hb39fc2d_0
  - pip=20.0.2=py37_3
  - plotly=4.8.1=py_0
  - prompt_toolkit=3.0.3=py_0
  - protobuf=3.11.4=py37he6710b0_0
  - psutil=5.6.7=py37h7b6447c_0
  - psycopg2=2.8.4=py37h1ba5d50_0
  - ptyprocess=0.6.0=py37_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.19=py37_0
  - pygments=2.5.2=py_0
  - pyjwt=1.7.1=py37_0
  - pynacl=1.3.0=py37h7b6447c_0
  - pyodbc=4.0.30=py37he6710b0_0
  - pyopenssl=19.1.0=py37_0
  - pyparsing=2.4.6=py_0
  - pysocks=1.7.1=py37_0
  - python=3.7.6=h0371630_2
  - python-dateutil=2.8.1=py_0
  - python-editor=1.0.4=py_0
  - pytorch=1.5.1=py3.7_cpu_0
  - pytz=2019.3=py_0
  - pyzmq=18.1.1=py37he6710b0_0
  - readline=7.0=h7b6447c_5
  - requests=2.22.0=py37_1
  - requests-oauthlib=1.3.0=py_0
  - retrying=1.3.3=py37_2
  - rsa=4.0=py_0
  - s3transfer=0.3.3=py37_0
  - scikit-learn=0.22.1=py37hd81dba3_0
  - scipy=1.4.1=py37h0b6359f_0
  - setuptools=45.2.0=py37_0
  - simplejson=3.17.0=py37h7b6447c_0
  - six=1.14.0=py37_0
  - smmap=3.0.2=py_0
  - sqlite=3.31.1=h62c20be_1
  - sqlparse=0.3.0=py_0
  - statsmodels=0.11.0=py37h7b6447c_0
  - tabulate=0.8.3=py37_0
  - tk=8.6.8=hbc83047_0
  - torchvision=0.6.1=py37_cpu
  - tornado=6.0.3=py37h7b6447c_3
  - tqdm=4.42.1=py_0
  - traitlets=4.3.3=py37_0
  - unixodbc=2.3.7=h14c3975_0
  - urllib3=1.25.8=py37_0
  - wcwidth=0.1.8=py_0
  - websocket-client=0.56.0=py37_0
  - werkzeug=1.0.0=py_0
  - wheel=0.34.2=py37_0
  - wrapt=1.11.2=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - zeromq=4.3.1=he6710b0_3
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
  - pip:
    - astunparse==1.6.3
    - azure-core==1.6.0
    - azure-storage-blob==12.3.2
    - databricks-cli==0.11.0
    - diskcache==4.1.0
    - docker==4.2.2
    - gorilla==0.3.0
    - horovod==0.19.5
    - hyperopt==0.2.4.db2
    - keras-preprocessing==1.1.2
    - koalas==1.0.1
    - mleap==0.16.0
    - mlflow==1.9.1
    - msrest==0.6.17
    - opt-einsum==3.2.1
    - petastorm==0.9.2
    - pyarrow==0.15.1
    - pyyaml==5.3.1
    - querystring-parser==1.2.4
    - seaborn==0.10.0
    - spark-tensorflow-distributor==0.1.0
    - sparkdl==2.1.0-db1
    - tensorboard==2.2.2
    - tensorboard-plugin-wit==1.7.0
    - tensorflow-cpu==2.2.0
    - tensorflow-estimator==2.2.0
    - termcolor==1.1.0
    - xgboost==1.1.1
prefix: /databricks/conda/envs/databricks-ml

Python on GPU clusters

name: databricks-ml-gpu
channels:
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - absl-py=0.9.0=py37_0
  - asn1crypto=1.3.0=py37_0
  - astor=0.8.0=py37_0
  - backcall=0.1.0=py37_0
  - backports=1.0=py_2
  - bcrypt=3.1.7=py37h7b6447c_1
  - blas=1.0=mkl
  - blinker=1.4=py37_0
  - boto3=1.12.0=py_0
  - botocore=1.15.0=py_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2020.6.24=0
  - cachetools=4.1.0=py_1
  - certifi=2020.6.20=py37_0
  - cffi=1.14.0=py37h2e261b9_0
  - chardet=3.0.4=py37_1003
  - click=7.0=py37_0
  - cloudpickle=1.3.0=py_0
  - configparser=3.7.4=py37_0
  - cryptography=2.8=py37h1ba5d50_0
  - cudatoolkit=10.1.243=h6bb024c_0
  - cycler=0.10.0=py37_0
  - cython=0.29.15=py37he6710b0_0
  - decorator=4.4.1=py_0
  - dill=0.3.1.1=py37_1
  - docutils=0.15.2=py37_0
  - entrypoints=0.3=py37_0
  - flask=1.1.1=py_1
  - freetype=2.9.1=h8a8886c_1
  - future=0.18.2=py37_1
  - gast=0.3.3=py_0
  - gitdb=4.0.5=py_0
  - gitdb2=4.0.2=py_0
  - gitpython=3.0.5=py_0
  - google-auth=1.11.2=py_0
  - google-auth-oauthlib=0.4.1=py_2
  - google-pasta=0.2.0=py_0
  - grpcio=1.27.2=py37hf8bcb03_0
  - gunicorn=20.0.4=py37_0
  - h5py=2.10.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - icu=58.2=he6710b0_3
  - idna=2.8=py37_0
  - intel-openmp=2020.0=166
  - ipykernel=5.1.4=py37h39e3cac_0
  - ipython=7.12.0=py37h5ca1d4c_0
  - ipython_genutils=0.2.0=py37_0
  - isodate=0.6.0=py_1
  - itsdangerous=1.1.0=py37_0
  - jedi=0.14.1=py37_0
  - jinja2=2.11.1=py_0
  - jmespath=0.9.4=py_0
  - joblib=0.14.1=py_0
  - jpeg=9b=h024ee3a_2
  - jupyter_client=5.3.4=py37_0
  - jupyter_core=4.6.1=py37_0
  - kiwisolver=1.1.0=py37he6710b0_0
  - krb5=1.16.4=h173b8e3_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libpq=11.2=h20c2e04_0
  - libprotobuf=3.11.4=hd408876_0
  - libsodium=1.0.16=h1bed415_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.1.0=h2733197_0
  - lightgbm=2.3.0=py37he6710b0_0
  - lz4-c=1.8.1.2=h14c3975_0
  - mako=1.1.2=py_0
  - markdown=3.1.1=py37_0
  - markupsafe=1.1.1=py37h7b6447c_0
  - matplotlib-base=3.1.3=py37hef1b27d_0
  - mkl=2020.0=166
  - mkl-service=2.3.0=py37he904b0f_0
  - mkl_fft=1.0.15=py37ha843d7b_0
  - mkl_random=1.1.0=py37hd6b4f25_0
  - ncurses=6.2=he6710b0_1
  - networkx=2.4=py_0
  - ninja=1.9.0=py37hfd86e86_0
  - nltk=3.4.5=py37_0
  - numpy=1.18.1=py37h4f9e942_0
  - numpy-base=1.18.1=py37hde5b4d6_1
  - oauthlib=3.1.0=py_0
  - olefile=0.46=py37_0
  - openssl=1.1.1g=h7b6447c_0
  - packaging=20.1=py_0
  - pandas=1.0.1=py37h0573a6f_0
  - paramiko=2.7.1=py_0
  - parso=0.5.2=py_0
  - patsy=0.5.1=py37_0
  - pexpect=4.8.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pillow=7.0.0=py37hb39fc2d_0
  - pip=20.0.2=py37_3
  - plotly=4.8.1=py_0
  - prompt_toolkit=3.0.3=py_0
  - protobuf=3.11.4=py37he6710b0_0
  - psutil=5.6.7=py37h7b6447c_0
  - psycopg2=2.8.4=py37h1ba5d50_0
  - ptyprocess=0.6.0=py37_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.19=py37_0
  - pygments=2.5.2=py_0
  - pyjwt=1.7.1=py37_0
  - pynacl=1.3.0=py37h7b6447c_0
  - pyodbc=4.0.30=py37he6710b0_0
  - pyopenssl=19.1.0=py37_0
  - pyparsing=2.4.6=py_0
  - pysocks=1.7.1=py37_0
  - python=3.7.6=h0371630_2
  - python-dateutil=2.8.1=py_0
  - python-editor=1.0.4=py_0
  - pytorch=1.5.1=py3.7_cuda10.1.243_cudnn7.6.3_0
  - pytz=2019.3=py_0
  - pyzmq=18.1.1=py37he6710b0_0
  - readline=7.0=h7b6447c_5
  - requests=2.22.0=py37_1
  - requests-oauthlib=1.3.0=py_0
  - retrying=1.3.3=py37_2
  - rsa=4.0=py_0
  - s3transfer=0.3.3=py37_0
  - scikit-learn=0.22.1=py37hd81dba3_0
  - scipy=1.4.1=py37h0b6359f_0
  - setuptools=45.2.0=py37_0
  - simplejson=3.17.0=py37h7b6447c_0
  - six=1.14.0=py37_0
  - smmap=3.0.2=py_0
  - sqlite=3.31.1=h62c20be_1
  - sqlparse=0.3.0=py_0
  - statsmodels=0.11.0=py37h7b6447c_0
  - tabulate=0.8.3=py37_0
  - tk=8.6.8=hbc83047_0
  - torchvision=0.6.1=py37_cu101
  - tornado=6.0.3=py37h7b6447c_3
  - tqdm=4.42.1=py_0
  - traitlets=4.3.3=py37_0
  - unixodbc=2.3.7=h14c3975_0
  - urllib3=1.25.8=py37_0
  - wcwidth=0.1.8=py_0
  - websocket-client=0.56.0=py37_0
  - werkzeug=1.0.0=py_0
  - wheel=0.34.2=py37_0
  - wrapt=1.11.2=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - zeromq=4.3.1=he6710b0_3
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
  - pip:
    - astunparse==1.6.3
    - azure-core==1.6.0
    - azure-storage-blob==12.3.2
    - databricks-cli==0.11.0
    - diskcache==4.1.0
    - docker==4.2.2
    - gorilla==0.3.0
    - horovod==0.19.5
    - hyperopt==0.2.4.db1
    - keras-preprocessing==1.1.2
    - koalas==1.0.1
    - mleap==0.16.0
    - mlflow==1.9.1
    - msrest==0.6.17
    - opt-einsum==3.2.1
    - petastorm==0.9.2
    - pyarrow==0.15.1
    - pyyaml==5.3.1
    - querystring-parser==1.2.4
    - seaborn==0.10.0
    - spark-tensorflow-distributor==0.1.0
    - sparkdl==2.1.0-db1
    - tensorboard==2.2.2
    - tensorboard-plugin-wit==1.7.0
    - tensorflow==2.2.0
    - tensorflow-estimator==2.2.0
    - termcolor==1.1.0
    - xgboost==1.1.1
prefix: /databricks/conda/envs/databricks-ml-gpu

Spark packages containing Python modules

Spark Package

Python Module

Version

graphframes

graphframes

0.8.0-db2-spark3.0

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 7.1.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 7.1, Databricks Runtime 7.1 ML contains the following JARs:

Group ID

Artifact ID

Version

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

0.17.1-4882dc3

ml.dmlc

xgboost4j-spark_2.12

1.0.0

ml.dmlc

xgboost4j_2.12

1.0.0

org.mlflow

mlflow-client

1.9.1

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0