Databricks Runtime 6.0 with Conda (Beta)

Note

This release is no longer available. If you want to use Conda to manage Python libraries and environments, use a supported version of Databricks Runtime for Machine Learning.

Databricks Runtime 6.0 with Conda (Beta) lets you take advantage of Conda to manage Python libraries and environments. This runtime offers two root Conda environment options at cluster creation:

  • Databricks Standard environment includes updated versions of many popular Python packages. This environment is intended as a drop-in replacement for existing notebooks that run on Databricks Runtime. This is the default Databricks Conda-based runtime environment.
  • Databricks Minimal environment contains a minimum number of packages that are required for PySpark and Databricks Python notebook functionality. This environment is ideal if you want to customize the runtime with various Python packages.

Both include support for Databricks Library utilities.

Note

The Scala, Java, and R libraries in Databricks Runtime 6.0 with Conda are identical to those in Databricks Runtime 6.0. For details, see the Databricks Runtime 6.0 (Unsupported) release notes. For information about how to use Databricks Runtime with Conda, see Databricks Runtime with Conda.

New Features

See Databricks Runtime 6.0 New features.

Improvements

See Databricks Runtime 6.0 Improvements.

Known Issues

  • By default, every Python notebook runs in its own isolated Conda environment. This isolated environment is cloned from the root Conda environment. Because this clone is an expensive operation, for certain cases, you may experience the following issues:

    • If the cluster instance type does not have local storage, cluster creation may fail with an error like:

      Could not start Spark. This can happen when installing incompatible libraries or when initialization scripts failed.
      databricks_error_message: Spark failed to start: Timed out after ... seconds
      
    • Concurrently attaching many Python notebooks to a single cluster (for example, triggered by scheduled jobs or notebook workflows) may cause some of those notebooks fail to attach.

    If you experience any of the above issues and you do not need to run Python notebooks in isolated environments (that is, your cluster is not shared), you can disable creating an isolated Python environment for every Python notebook by setting spark.databricks.libraryIsolation.enabled to false in Spark configuration. Setting this flag also disables dbutils.library.

  • If you upgrade the installed Conda, the new version of Conda may not include the fix for Conda issue 9104 (Conda List fails if “RECORD” file has duplicate entries). If you upgrade Conda and see failures on attaching Python notebooks or using conda list with the error TypeError: '<' not supported between instances of 'NoneType' and 'str' in driver logs or on a notebook, either use a version of Conda that has the fix or avoid upgrading Conda installed in this release.

System environment

The system environment in Databricks Runtime 6.0 with Conda differs from Databricks Runtime 6.0 as follows:

There are some differences in the installed Python libraries.

Libraries

The following is the exported environment.yml file for default root environments on Databricks Runtime 6.0 with Conda.

Databricks Standard

name: databricks-standard
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - asn1crypto=0.24.0=py37_0
  - backcall=0.1.0=py37_0
  - blas=1.0=openblas
  - boto=2.49.0=py37_0
  - boto3=1.9.162=py_0
  - botocore=1.12.163=py_0
  - ca-certificates=2019.1.23=0
  - certifi=2019.3.9=py37_0
  - cffi=1.12.2=py37h2e261b9_1
  - chardet=3.0.4=py37_1003
  - cryptography=2.6.1=py37h1ba5d50_0
  - cython=0.29.6=py37he6710b0_0
  - decorator=4.4.0=py37_1
  - docutils=0.14=py37_0
  - idna=2.8=py37_0
  - ipython=7.4.0=py37h39e3cac_0
  - ipython_genutils=0.2.0=py37_0
  - jedi=0.13.3=py37_0
  - jmespath=0.9.4=py_0
  - krb5=1.16.1=h173b8e3_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=8.2.0=hdf63c60_1
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libopenblas=0.3.6=h5a2b251_1
  - libpq=11.2=h20c2e04_0
  - libstdcxx-ng=8.2.0=hdf63c60_1
  - ncurses=6.1=he6710b0_1
  - nomkl=3.0=0
  - numpy=1.16.2=py37h99e49ec_0
  - numpy-base=1.16.2=py37h2f8d375_0
  - openssl=1.1.1b=h7b6447c_1
  - pandas=0.24.2=py37he6710b0_0
  - parso=0.3.4=py37_0
  - patsy=0.5.1=py37_0
  - pexpect=4.6.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pip=19.0.3=py37_0
  - prompt_toolkit=2.0.9=py37_0
  - psycopg2=2.7.6.1=py37h1ba5d50_0
  - ptyprocess=0.6.0=py37_0
  - pycparser=2.19=py37_0
  - pygments=2.3.1=py37_0
  - pyopenssl=19.0.0=py37_0
  - pysocks=1.6.8=py37_0
  - python=3.7.3=h0371630_0
  - python-dateutil=2.8.0=py37_0
  - pytz=2018.9=py37_0
  - readline=7.0=h7b6447c_5
  - requests=2.21.0=py37_0
  - s3transfer=0.2.1=py37_0
  - scikit-learn=0.20.3=py37h22eb022_0
  - scipy=1.2.1=py37he2b7bc3_0
  - setuptools=40.8.0=py37_0
  - six=1.12.0=py37_0
  - sqlite=3.27.2=h7b6447c_0
  - statsmodels=0.9.0=py37h035aef0_0
  - tk=8.6.8=hbc83047_0
  - traitlets=4.3.2=py37_0
  - urllib3=1.24.1=py37_0
  - wcwidth=0.1.7=py37_0
  - wheel=0.33.1=py37_0
  - xz=5.2.4=h14c3975_4
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - cycler==0.10.0
    - kiwisolver==1.1.0
    - matplotlib==3.0.3
    - pyarrow==0.13.0
    - pyparsing==2.4.2
    - seaborn==0.9.0
prefix: /databricks/conda/envs/databricks-standard

Databricks Minimal

name: databricks-minimal
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - backcall=0.1.0=py37_0
  - blas=1.0=openblas
  - ca-certificates=2019.1.23=0
  - certifi=2019.3.9=py37_0
  - decorator=4.4.0=py37_1
  - ipython=7.4.0=py37h39e3cac_0
  - ipython_genutils=0.2.0=py37_0
  - jedi=0.13.3=py37_0
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=8.2.0=hdf63c60_1
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libopenblas=0.3.6=h5a2b251_1
  - libstdcxx-ng=8.2.0=hdf63c60_1
  - ncurses=6.1=he6710b0_1
  - nomkl=3.0=0
  - numpy=1.16.2=py37h99e49ec_0
  - numpy-base=1.16.2=py37h2f8d375_0
  - openssl=1.1.1b=h7b6447c_1
  - pandas=0.24.2=py37he6710b0_0
  - parso=0.3.4=py37_0
  - pexpect=4.6.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pip=19.0.3=py37_0
  - prompt_toolkit=2.0.9=py37_0
  - ptyprocess=0.6.0=py37_0
  - pygments=2.3.1=py37_0
  - python=3.7.3=h0371630_0
  - python-dateutil=2.8.0=py37_0
  - pytz=2018.9=py37_0
  - readline=7.0=h7b6447c_5
  - setuptools=40.8.0=py37_0
  - six=1.12.0=py37_0
  - sqlite=3.27.2=h7b6447c_0
  - tk=8.6.8=hbc83047_0
  - traitlets=4.3.2=py37_0
  - wcwidth=0.1.7=py37_0
  - wheel=0.33.1=py37_0
  - xz=5.2.4=h14c3975_4
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - pyarrow==0.13.0
prefix: /databricks/conda/envs/databricks-minimal