Databricks Runtime 6.0 for ML (unsupported)

Databricks released this image in October 2019.

Databricks Runtime 6.0 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 6.0 (unsupported). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. It also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see AI and Machine Learning on Databricks.

New features

Databricks Runtime 6.0 ML is built on top of Databricks Runtime 6.0. For information on what’s new in Databricks Runtime 6.0, see the Databricks Runtime 6.0 (unsupported) release notes.

Query MLflow experiment data at scale using the new MLflow Spark data source

The Spark data source for MLflow experiments now provides a standard API to load MLflow experiment run data. This enables large-scale querying and analysis of MLflow experiment data using DataFrame APIs. For a given experiment, the DataFrame contains run_ids, metrics, params, tags, start_time, end_time, status, and the artifact_uri for artifacts. See MLflow experiment.

Improvements

  • Hyperopt GA

    Hyperopt on Databricks is now generally available. Notable improvements since public preview include support for MLflow logging on Spark workers, correct handling of PySpark broadcast variables, as well as a new guide on model selection using Hyperopt. We also fixed small bugs in log messages, error handling, UI, and made our docs more reader friendly. For details, see the Hyperopt documentation.

    We have updated how Databricks logs Hyperopt experiments so that you can now log a custom metric during Hyperopt runs by passing the metric to the mlflow.log_metric function (see log_metric). This is useful if you want to log custom metrics in addition to loss, which is logged by default when the hyperopt.fmin function is called.

  • MLflow

    • Added MLflow Java Client 1.2.0

    • MLflow is now promoted as a top-tier library

  • Upgraded machine learning libraries

    • Horovod upgraded from 0.16.4 to 0.18.1

    • MLflow upgraded from 1.0.0 to 1.2.0

  • Anaconda distribution upgraded from 5.2.0 to 2019.03

Removal

  • Databricks ML Model Export is removed. Use MLeap for importing and exporting models instead.

  • In the Hyperopt library, the following properties of hyperopt.SparkTrials are removed:

    • SparkTrials.successful_trials_count

    • SparkTrials.failed_trials_count

    • SparkTrials.cancelled_trials_count

    • SparkTrials.total_trials_count

    They are replaced with the following functions:

    • SparkTrials.count_successful_trials()

    • SparkTrials.count_failed_trials()

    • SparkTrials.count_cancelled_trials()

    • SparkTrials.count_total_trials()

System environment

The system environment in Databricks Runtime 6.0 ML differs from Databricks Runtime 6.0 as follows:

Libraries

The following sections list the libraries included in Databricks Runtime 6.0 ML that differ from those included in Databricks Runtime 6.0.

Top-tier libraries

Databricks Runtime 6.0 ML includes the following top-tier libraries:

Python libraries

Databricks Runtime 6.0 ML uses Conda for Python package management and includes many popular ML packages. The following section describes the Conda environment for Databricks Runtime 6.0 ML.

Python 3 on CPU clusters

name: databricks-ml
channels:
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _py-xgboost-mutex=2.0=cpu_0
  - _tflow_select=2.3.0=mkl
  - absl-py=0.7.1=py37_0
  - asn1crypto=0.24.0=py37_0
  - astor=0.8.0=py37_0
  - backcall=0.1.0=py37_0
  - backports=1.0=py_2
  - bcrypt=3.1.6=py37h7b6447c_0
  - blas=1.0=mkl
  - boto=2.49.0=py37_0
  - boto3=1.9.162=py_0
  - botocore=1.12.163=py_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2019.1.23=0
  - certifi=2019.3.9=py37_0
  - cffi=1.12.2=py37h2e261b9_1
  - chardet=3.0.4=py37_1003
  - click=7.0=py37_0
  - cloudpickle=0.8.0=py37_0
  - colorama=0.4.1=py37_0
  - configparser=3.7.4=py37_0
  - cryptography=2.6.1=py37h1ba5d50_0
  - cycler=0.10.0=py37_0
  - cython=0.29.6=py37he6710b0_0
  - decorator=4.4.0=py37_1
  - docutils=0.14=py37_0
  - entrypoints=0.3=py37_0
  - et_xmlfile=1.0.1=py37_0
  - flask=1.0.2=py37_1
  - freetype=2.9.1=h8a8886c_1
  - future=0.17.1=py37_0
  - gast=0.2.2=py37_0
  - gitdb2=2.0.5=py37_0
  - gitpython=2.1.11=py37_0
  - grpcio=1.16.1=py37hf8bcb03_1
  - gunicorn=19.9.0=py37_0
  - h5py=2.9.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - html5lib=1.0.1=py_0
  - icu=58.2=h9c2bf20_1
  - idna=2.8=py37_0
  - intel-openmp=2019.3=199
  - ipython=7.4.0=py37h39e3cac_0
  - ipython_genutils=0.2.0=py37_0
  - itsdangerous=1.1.0=py37_0
  - jdcal=1.4=py37_0
  - jedi=0.13.3=py37_0
  - jinja2=2.10=py37_0
  - jmespath=0.9.4=py_0
  - jpeg=9b=h024ee3a_2
  - keras=2.2.4=0
  - keras-applications=1.0.8=py_0
  - keras-base=2.2.4=py37_0
  - keras-preprocessing=1.1.0=py_1
  - kiwisolver=1.0.1=py37hf484d3e_0
  - krb5=1.16.1=h173b8e3_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=8.2.0=hdf63c60_1
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.36=hbc83047_0
  - libpq=11.2=h20c2e04_0
  - libprotobuf=3.8.0=hd408876_0
  - libsodium=1.0.16=h1bed415_0
  - libstdcxx-ng=8.2.0=hdf63c60_1
  - libtiff=4.0.10=h2733197_2
  - libxgboost=0.90=he6710b0_0
  - libxml2=2.9.9=hea5a465_1
  - libxslt=1.1.33=h7d1a2b0_0
  - llvmlite=0.28.0=py37hd408876_0
  - lxml=4.3.2=py37hefd8a0e_0
  - mako=1.0.10=py_0
  - markdown=3.1.1=py37_0
  - markupsafe=1.1.1=py37h7b6447c_0
  - mkl=2019.3=199
  - mkl_fft=1.0.10=py37ha843d7b_0
  - mkl_random=1.0.2=py37hd81dba3_0
  - mock=3.0.5=py37_0
  - ncurses=6.1=he6710b0_1
  - networkx=2.2=py37_1
  - ninja=1.9.0=py37hfd86e86_0
  - nose=1.3.7=py37_2
  - numba=0.43.1=py37h962f231_0
  - numpy=1.16.2=py37h7e9f1db_0
  - numpy-base=1.16.2=py37hde5b4d6_0
  - olefile=0.46=py37_0
  - openpyxl=2.6.1=py37_1
  - openssl=1.1.1b=h7b6447c_1
  - pandas=0.24.2=py37he6710b0_0
  - paramiko=2.4.2=py37_0
  - parso=0.3.4=py37_0
  - pathlib2=2.3.3=py37_0
  - patsy=0.5.1=py37_0
  - pexpect=4.6.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pillow=5.4.1=py37h34e0f95_0
  - pip=19.0.3=py37_0
  - ply=3.11=py37_0
  - prompt_toolkit=2.0.9=py37_0
  - protobuf=3.8.0=py37he6710b0_0
  - psutil=5.6.1=py37h7b6447c_0
  - psycopg2=2.7.6.1=py37h1ba5d50_0
  - ptyprocess=0.6.0=py37_0
  - py-xgboost=0.90=py37he6710b0_0
  - py-xgboost-cpu=0.90=py37_0
  - pyasn1=0.4.6=py_0
  - pycparser=2.19=py37_0
  - pygments=2.3.1=py37_0
  - pymongo=3.8.0=py37he6710b0_1
  - pynacl=1.3.0=py37h7b6447c_0
  - pyopenssl=19.0.0=py37_0
  - pyparsing=2.3.1=py37_0
  - pysocks=1.6.8=py37_0
  - python=3.7.3=h0371630_0
  - python-dateutil=2.8.0=py37_0
  - python-editor=1.0.4=py_0
  - pytorch-cpu=1.1.0=py3.7_cpu_0
  - pytz=2018.9=py37_0
  - pyyaml=5.1=py37h7b6447c_0
  - readline=7.0=h7b6447c_5
  - requests=2.21.0=py37_0
  - s3transfer=0.2.1=py37_0
  - scikit-learn=0.20.3=py37hd81dba3_0
  - scipy=1.2.1=py37h7c811a0_0
  - setuptools=40.8.0=py37_0
  - simplejson=3.16.0=py37h14c3975_0
  - singledispatch=3.4.0.3=py37_0
  - six=1.12.0=py37_0
  - smmap2=2.0.5=py37_0
  - sqlite=3.27.2=h7b6447c_0
  - sqlparse=0.3.0=py_0
  - statsmodels=0.9.0=py37h035aef0_0
  - tabulate=0.8.3=py37_0
  - tensorboard=1.13.1=py37hf484d3e_0
  - tensorflow=1.13.1=mkl_py37h54b294f_0
  - tensorflow-base=1.13.1=mkl_py37h7ce6ba3_0
  - tensorflow-estimator=1.13.0=py_0
  - tensorflow-mkl=1.13.1=h4fcabd2_0
  - termcolor=1.1.0=py37_1
  - tk=8.6.8=hbc83047_0
  - torchvision-cpu=0.3.0=py37_cuNone_1
  - tqdm=4.31.1=py37_1
  - traitlets=4.3.2=py37_0
  - urllib3=1.24.1=py37_0
  - virtualenv=16.0.0=py37_0
  - wcwidth=0.1.7=py37_0
  - webencodings=0.5.1=py37_1
  - websocket-client=0.56.0=py37_0
  - werkzeug=0.14.1=py37_0
  - wheel=0.33.1=py37_0
  - wrapt=1.11.1=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
  - pip:
    - argparse==1.4.0
    - databricks-cli==0.9.0
    - docker==4.0.2
    - fusepy==2.0.4
    - gorilla==0.3.0
    - horovod==0.18.1
    - hyperopt==0.1.2.db8
    - matplotlib==3.0.3
    - mleap==0.8.1
    - mlflow==1.2.0
    - nose-exclude==0.5.0
    - pyarrow==0.13.0
    - querystring-parser==1.2.4
    - seaborn==0.9.0
    - tensorboardx==1.8
prefix: /databricks/conda/envs/databricks-ml

Spark packages containing Python modules

Spark Package

Python Module

Version

graphframes

graphframes

0.7.0-db1-spark2.4

spark-deep-learning

sparkdl

1.5.0-db5-spark2.4

tensorframes

tensorframes

0.7.0-s_2.11

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 6.0.

Java and Scala libraries (Scala 2.11 cluster)

In addition to Java and Scala libraries in Databricks Runtime 6.0, Databricks Runtime 6.0 ML contains the following JARs:

Group ID

Artifact ID

Version

com.databricks

spark-deep-learning

1.5.0-db5-spark2.4

com.typesafe.akka

akka-actor_2.11

2.3.11

ml.combust.mleap

mleap-databricks-runtime_2.11

0.14.0

ml.dmlc

xgboost4j

0.90

ml.dmlc

xgboost4j-spark

0.90

org.graphframes

graphframes_2.11

0.7.0-db1-spark2.4

org.mlflow

mlflow-client

1.2.0

org.tensorflow

libtensorflow

1.13.1

org.tensorflow

libtensorflow_jni

1.13.1

org.tensorflow

spark-tensorflow-connector_2.11

1.13.1

org.tensorflow

tensorflow

1.13.1

org.tensorframes

tensorframes

0.7.0-s_2.11