Databricks Runtime 6.4 ML

Databricks released this image in February 2020. It is supported through April 2021.

Databricks Runtime 6.4 ML provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 6.4. Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. It also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

New features

Databricks Runtime 6.4 ML is built on top of Databricks Runtime 6.4. For information on what’s new in Databricks Runtime 6.4, see the Databricks Runtime 6.4 release notes.

Improvements

Deprecations

  • The standalone keras package is deprecated and will be removed in an upcoming major release of Databricks Runtime for ML. Databricks recommends you instead use tensorflow.keras.
  • The pymongo package is deprecated and will be removed in an upcoming major release of Databricks Runtime for ML.

Upgraded machine learning libraries

  • PyTorch: 1.3.1 to 1.4.0
  • Horovod: 0.18.2 to 1.19.0

System environment

The system environment in Databricks Runtime 6.4 ML differs from Databricks Runtime 6.4 as follows:

  • DBUtils: Does not contain Library utilities.
  • For GPU clusters, the following NVIDIA GPU libraries:
    • NVIDIA driver 418.40
    • CUDA 10.0
    • cuDNN 7.6.4
    • NCCL 2.4.8

Libraries

The following sections list the libraries included in Databricks Runtime 6.4 ML that differ from those included in Databricks Runtime 6.4.

Python libraries

Databricks Runtime 6.4 ML uses Conda for Python package management and includes many popular ML packages. The following section describes the Conda environment for Databricks Runtime 6.4 ML.

Python on CPU clusters

name: databricks-ml
channels:
  - Databricks
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _py-xgboost-mutex=2.0=cpu_0
  - _tflow_select=2.3.0=mkl
  - absl-py=0.9.0=py37_0
  - asn1crypto=0.24.0=py37_0
  - astor=0.8.0=py37_0
  - backcall=0.1.0=py37_0
  - backports=1.0=py_2
  - bcrypt=3.1.7=py37h7b6447c_0
  - blas=1.0=mkl
  - boto=2.49.0=py37_0
  - boto3=1.9.162=py_0
  - botocore=1.12.163=py_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2019.1.23=0
  - certifi=2019.3.9=py37_0
  - cffi=1.12.2=py37h2e261b9_1
  - chardet=3.0.4=py37_1003
  - click=7.0=py_0
  - cloudpickle=0.8.0=py37_0
  - colorama=0.4.1=py_0
  - configparser=3.7.4=py37_0
  - cpuonly=1.0=0
  - cryptography=2.6.1=py37h1ba5d50_0
  - cycler=0.10.0=py37_0
  - cython=0.29.6=py37he6710b0_0
  - decorator=4.4.0=py37_1
  - docutils=0.14=py37_0
  - entrypoints=0.3=py37_0
  - et_xmlfile=1.0.1=py37_0
  - flask=1.0.2=py37_1
  - freetype=2.9.1=h8a8886c_1
  - future=0.17.1=py37_0
  - gast=0.2.2=py37_0
  - gitdb2=2.0.6=py_0
  - gitpython=2.1.11=py37_0
  - google-pasta=0.1.8=py_0
  - grpcio=1.16.1=py37hf8bcb03_1
  - gunicorn=19.9.0=py37_0
  - h5py=2.9.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - html5lib=1.0.1=py_0
  - icu=58.2=h9c2bf20_1
  - idna=2.8=py37_0
  - intel-openmp=2019.3=199
  - ipykernel=5.1.0=py37h39e3cac_0
  - ipython=7.4.0=py37h39e3cac_0
  - ipython_genutils=0.2.0=py37_0
  - itsdangerous=1.1.0=py_0
  - jdcal=1.4=py37_0
  - jedi=0.13.3=py37_0
  - jinja2=2.10=py37_0
  - jmespath=0.9.4=py_0
  - jpeg=9b=h024ee3a_2
  - jupyter_client=5.2.4=py37_0
  - jupyter_core=4.4.0=py37_0
  - keras-applications=1.0.8=py_0
  - keras-preprocessing=1.1.0=py_1
  - kiwisolver=1.0.1=py37hf484d3e_0
  - krb5=1.16.1=h173b8e3_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=8.2.0=hdf63c60_1
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.36=hbc83047_0
  - libpq=11.2=h20c2e04_0
  - libprotobuf=3.11.4=hd408876_0
  - libsodium=1.0.16=h1bed415_0
  - libstdcxx-ng=8.2.0=hdf63c60_1
  - libtiff=4.0.10=h2733197_2
  - libxgboost=0.90=he6710b0_1
  - libxml2=2.9.9=hea5a465_1
  - libxslt=1.1.33=h7d1a2b0_0
  - llvmlite=0.28.0=py37hd408876_0
  - lxml=4.3.2=py37hefd8a0e_0
  - mako=1.0.10=py_0
  - markdown=3.1.1=py37_0
  - markupsafe=1.1.1=py37h7b6447c_0
  - mkl=2019.3=199
  - mkl_fft=1.0.10=py37ha843d7b_0
  - mkl_random=1.0.2=py37hd81dba3_0
  - ncurses=6.1=he6710b0_1
  - networkx=2.2=py37_1
  - ninja=1.9.0=py37hfd86e86_0
  - nose=1.3.7=py37_2
  - numba=0.43.1=py37h962f231_0
  - numpy=1.16.2=py37h7e9f1db_0
  - numpy-base=1.16.2=py37hde5b4d6_0
  - olefile=0.46=py_0
  - openpyxl=2.6.1=py37_1
  - openssl=1.1.1b=h7b6447c_1
  - opt_einsum=3.1.0=py_0
  - pandas=0.24.2=py37he6710b0_0
  - paramiko=2.4.2=py37_0
  - parso=0.3.4=py37_0
  - pathlib2=2.3.3=py37_0
  - patsy=0.5.1=py37_0
  - pexpect=4.6.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pillow=5.4.1=py37h34e0f95_0
  - pip=19.0.3=py37_0
  - ply=3.11=py37_0
  - prompt_toolkit=2.0.9=py37_0
  - protobuf=3.11.4=py37he6710b0_0
  - psutil=5.6.1=py37h7b6447c_0
  - psycopg2=2.7.6.1=py37h1ba5d50_0
  - ptyprocess=0.6.0=py37_0
  - py-xgboost=0.90=py37he6710b0_1
  - py-xgboost-cpu=0.90=py37_1
  - pyasn1=0.4.8=py_0
  - pycparser=2.19=py_0
  - pygments=2.3.1=py37_0
  - pymongo=3.8.0=py37he6710b0_1
  - pynacl=1.3.0=py37h7b6447c_0
  - pyopenssl=19.0.0=py37_0
  - pyparsing=2.3.1=py37_0
  - pysocks=1.6.8=py37_0
  - python=3.7.3=h0371630_0
  - python-dateutil=2.8.0=py37_0
  - python-editor=1.0.4=py_0
  - pytorch=1.4.0=py3.7_cpu_0
  - pytz=2018.9=py37_0
  - pyyaml=5.1=py37h7b6447c_0
  - pyzmq=18.0.0=py37he6710b0_0
  - readline=7.0=h7b6447c_5
  - requests=2.21.0=py37_0
  - s3transfer=0.2.1=py37_0
  - scikit-learn=0.20.3=py37hd81dba3_0
  - scipy=1.2.1=py37h7c811a0_0
  - setuptools=40.8.0=py37_0
  - simplejson=3.16.0=py37h14c3975_0
  - singledispatch=3.4.0.3=py37_0
  - six=1.12.0=py37_0
  - smmap2=2.0.5=py_0
  - sqlite=3.27.2=h7b6447c_0
  - sqlparse=0.3.0=py_0
  - statsmodels=0.9.0=py37h035aef0_0
  - tabulate=0.8.3=py37_0
  - tensorboard=1.15.0+db2=pyhb230dea_0
  - tensorflow=1.15.0+db2=mkl_py37hc5fbf04_0
  - tensorflow-base=1.15.0+db2=mkl_py37h2ae1e84_0
  - tensorflow-estimator=1.15.1+db2=pyh2649769_0
  - tensorflow-mkl=1.15.0+db2=h4fcabd2_0
  - termcolor=1.1.0=py37_1
  - tk=8.6.8=hbc83047_0
  - torchvision=0.5.0=py37_cpu
  - tornado=6.0.2=py37h7b6447c_0
  - tqdm=4.31.1=py37_1
  - traitlets=4.3.2=py37_0
  - urllib3=1.24.1=py37_0
  - virtualenv=16.0.0=py37_0
  - wcwidth=0.1.7=py37_0
  - webencodings=0.5.1=py37_1
  - websocket-client=0.56.0=py37_0
  - werkzeug=0.14.1=py37_0
  - wheel=0.33.1=py37_0
  - wrapt=1.11.1=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - zeromq=4.3.1=he6710b0_3
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
  - pip:
    - argparse==1.4.0
    - databricks-cli==0.9.1
    - deprecated==1.2.7
    - docker==4.2.0
    - fusepy==2.0.4
    - gorilla==0.3.0
    - horovod==0.19.0
    - hyperopt==0.2.2.db1
    - keras==2.2.5
    - matplotlib==3.0.3
    - mleap==0.8.1
    - mlflow==1.5.0
    - nose-exclude==0.5.0
    - pyarrow==0.13.0
    - querystring-parser==1.2.4
    - seaborn==0.9.0
    - tensorboardx==1.9
prefix: /databricks/conda/envs/databricks-ml

Python on GPU clusters

name: databricks-ml-gpu
channels:
  - Databricks
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _py-xgboost-mutex=1.0=gpu_0
  - _tflow_select=2.1.0=gpu
  - absl-py=0.9.0=py37_0
  - asn1crypto=0.24.0=py37_0
  - astor=0.8.0=py37_0
  - backcall=0.1.0=py37_0
  - backports=1.0=py_2
  - bcrypt=3.1.7=py37h7b6447c_0
  - blas=1.0=mkl
  - boto=2.49.0=py37_0
  - boto3=1.9.162=py_0
  - botocore=1.12.163=py_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2019.1.23=0
  - certifi=2019.3.9=py37_0
  - cffi=1.12.2=py37h2e261b9_1
  - chardet=3.0.4=py37_1003
  - click=7.0=py_0
  - cloudpickle=0.8.0=py37_0
  - colorama=0.4.1=py_0
  - configparser=3.7.4=py37_0
  - cryptography=2.6.1=py37h1ba5d50_0
  - cudatoolkit=10.0.130=0
  - cudnn=7.6.4=cuda10.0_0
  - cupti=10.0.130=0
  - cycler=0.10.0=py37_0
  - cython=0.29.6=py37he6710b0_0
  - decorator=4.4.0=py37_1
  - docutils=0.14=py37_0
  - entrypoints=0.3=py37_0
  - et_xmlfile=1.0.1=py37_0
  - flask=1.0.2=py37_1
  - freetype=2.9.1=h8a8886c_1
  - future=0.17.1=py37_0
  - gast=0.2.2=py37_0
  - gitdb2=2.0.6=py_0
  - gitpython=2.1.11=py37_0
  - google-pasta=0.1.8=py_0
  - grpcio=1.16.1=py37hf8bcb03_1
  - gunicorn=19.9.0=py37_0
  - h5py=2.9.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - html5lib=1.0.1=py_0
  - icu=58.2=h9c2bf20_1
  - idna=2.8=py37_0
  - intel-openmp=2019.3=199
  - ipykernel=5.1.0=py37h39e3cac_0
  - ipython=7.4.0=py37h39e3cac_0
  - ipython_genutils=0.2.0=py37_0
  - itsdangerous=1.1.0=py_0
  - jdcal=1.4=py37_0
  - jedi=0.13.3=py37_0
  - jinja2=2.10=py37_0
  - jmespath=0.9.4=py_0
  - jpeg=9b=h024ee3a_2
  - jupyter_client=5.2.4=py37_0
  - jupyter_core=4.4.0=py37_0
  - keras-applications=1.0.8=py_0
  - keras-preprocessing=1.1.0=py_1
  - kiwisolver=1.0.1=py37hf484d3e_0
  - krb5=1.16.1=h173b8e3_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=8.2.0=hdf63c60_1
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.36=hbc83047_0
  - libpq=11.2=h20c2e04_0
  - libprotobuf=3.11.4=hd408876_0
  - libsodium=1.0.16=h1bed415_0
  - libstdcxx-ng=8.2.0=hdf63c60_1
  - libtiff=4.0.10=h2733197_2
  - libxgboost=0.90=h688424c_0
  - libxml2=2.9.9=hea5a465_1
  - libxslt=1.1.33=h7d1a2b0_0
  - llvmlite=0.28.0=py37hd408876_0
  - lxml=4.3.2=py37hefd8a0e_0
  - mako=1.0.10=py_0
  - markdown=3.1.1=py37_0
  - markupsafe=1.1.1=py37h7b6447c_0
  - mkl=2019.3=199
  - mkl_fft=1.0.10=py37ha843d7b_0
  - mkl_random=1.0.2=py37hd81dba3_0
  - ncurses=6.1=he6710b0_1
  - networkx=2.2=py37_1
  - ninja=1.9.0=py37hfd86e86_0
  - nose=1.3.7=py37_2
  - numba=0.43.1=py37h962f231_0
  - numpy=1.16.2=py37h7e9f1db_0
  - numpy-base=1.16.2=py37hde5b4d6_0
  - olefile=0.46=py_0
  - openpyxl=2.6.1=py37_1
  - openssl=1.1.1b=h7b6447c_1
  - opt_einsum=3.1.0=py_0
  - pandas=0.24.2=py37he6710b0_0
  - paramiko=2.4.2=py37_0
  - parso=0.3.4=py37_0
  - pathlib2=2.3.3=py37_0
  - patsy=0.5.1=py37_0
  - pexpect=4.6.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pillow=5.4.1=py37h34e0f95_0
  - pip=19.0.3=py37_0
  - ply=3.11=py37_0
  - prompt_toolkit=2.0.9=py37_0
  - protobuf=3.11.4=py37he6710b0_0
  - psutil=5.6.1=py37h7b6447c_0
  - psycopg2=2.7.6.1=py37h1ba5d50_0
  - ptyprocess=0.6.0=py37_0
  - py-xgboost=0.90=py37h688424c_0
  - py-xgboost-gpu=0.90=py37h28bbb66_0
  - pyasn1=0.4.8=py_0
  - pycparser=2.19=py_0
  - pygments=2.3.1=py37_0
  - pymongo=3.8.0=py37he6710b0_1
  - pynacl=1.3.0=py37h7b6447c_0
  - pyopenssl=19.0.0=py37_0
  - pyparsing=2.3.1=py37_0
  - pysocks=1.6.8=py37_0
  - python=3.7.3=h0371630_0
  - python-dateutil=2.8.0=py37_0
  - python-editor=1.0.4=py_0
  - pytorch=1.4.0=py3.7_cuda10.0.130_cudnn7.6.3_0
  - pytz=2018.9=py37_0
  - pyyaml=5.1=py37h7b6447c_0
  - pyzmq=18.0.0=py37he6710b0_0
  - readline=7.0=h7b6447c_5
  - requests=2.21.0=py37_0
  - s3transfer=0.2.1=py37_0
  - scikit-learn=0.20.3=py37hd81dba3_0
  - scipy=1.2.1=py37h7c811a0_0
  - setuptools=40.8.0=py37_0
  - simplejson=3.16.0=py37h14c3975_0
  - singledispatch=3.4.0.3=py37_0
  - six=1.12.0=py37_0
  - smmap2=2.0.5=py_0
  - sqlite=3.27.2=h7b6447c_0
  - sqlparse=0.3.0=py_0
  - statsmodels=0.9.0=py37h035aef0_0
  - tabulate=0.8.3=py37_0
  - tensorboard=1.15.0+db2=pyhb230dea_0
  - tensorflow=1.15.0+db2=gpu_py37h9fd0ff8_0
  - tensorflow-base=1.15.0+db2=gpu_py37hd56f5dd_0
  - tensorflow-estimator=1.15.1+db2=pyh2649769_0
  - tensorflow-gpu=1.15.0+db2=h0d30ee6_0
  - termcolor=1.1.0=py37_1
  - tk=8.6.8=hbc83047_0
  - torchvision=0.5.0=py37_cu100
  - tornado=6.0.2=py37h7b6447c_0
  - tqdm=4.31.1=py37_1
  - traitlets=4.3.2=py37_0
  - urllib3=1.24.1=py37_0
  - virtualenv=16.0.0=py37_0
  - wcwidth=0.1.7=py37_0
  - webencodings=0.5.1=py37_1
  - websocket-client=0.56.0=py37_0
  - werkzeug=0.14.1=py37_0
  - wheel=0.33.1=py37_0
  - wrapt=1.11.1=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - zeromq=4.3.1=he6710b0_3
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
  - pip:
    - argparse==1.4.0
    - databricks-cli==0.9.1
    - deprecated==1.2.7
    - docker==4.2.0
    - fusepy==2.0.4
    - gorilla==0.3.0
    - horovod==0.19.0
    - hyperopt==0.2.2.db1
    - keras==2.2.5
    - matplotlib==3.0.3
    - mleap==0.8.1
    - mlflow==1.5.0
    - nose-exclude==0.5.0
    - pyarrow==0.13.0
    - querystring-parser==1.2.4
    - seaborn==0.9.0
    - tensorboardx==1.9
prefix: /databricks/conda/envs/databricks-ml-gpu

Spark packages containing Python modules

Spark Package Python Module Version
graphframes graphframes 0.7.0-db1-spark2.4
spark-deep-learning sparkdl 1.5.0-db12-spark2.4
tensorframes tensorframes 0.8.2-s_2.11

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 6.4.

Java and Scala libraries (Scala 2.11 cluster)

In addition to Java and Scala libraries in Databricks Runtime 6.4, Databricks Runtime 6.4 ML contains the following JARs:

Group ID Artifact ID Version
com.databricks spark-deep-learning 1.5.0-db12-spark2.4
com.typesafe.akka akka-actor_2.11 2.3.11
ml.combust.mleap mleap-databricks-runtime_2.11 0.15.0
ml.dmlc xgboost4j 0.90
ml.dmlc xgboost4j-spark 0.90
org.graphframes graphframes_2.11 0.7.0-db1-spark2.4
org.mlflow mlflow-client 1.4.0
org.tensorflow libtensorflow 1.15.0
org.tensorflow libtensorflow_jni 1.15.0
org.tensorflow spark-tensorflow-connector_2.11 1.15.0
org.tensorflow tensorflow 1.15.0
org.tensorframes tensorframes 0.8.2-s_2.11