Databricks Runtime 6.0 with Conda (EoS)
Note
Support for this Databricks Runtime version has ended. For the end-of-support date, see End-of-support history. For all supported Databricks Runtime versions, see Databricks Runtime release notes versions and compatibility.
Note
This release is no longer available. If you want to use Conda to manage Python libraries and environments, use a supported version of Databricks Runtime for Machine Learning.
Databricks Runtime 6.0 with Conda (Beta) lets you take advantage of Conda to manage Python libraries and environments. This runtime offers two root Conda environment options at cluster creation:
Databricks Standard environment includes updated versions of many popular Python packages. This environment is intended as a drop-in replacement for existing notebooks that run on Databricks Runtime. This is the default Databricks Conda-based runtime environment.
Databricks Minimal environment contains a minimum number of packages that are required for PySpark and Databricks Python notebook functionality. This environment is ideal if you want to customize the runtime with various Python packages.
Both include support for Databricks Library utility (dbutils.library) (legacy).
Note
The Scala, Java, and R libraries in Databricks Runtime 6.0 with Conda are identical to those in Databricks Runtime 6.0. For details, see the Databricks Runtime 6.0 (EoS) release notes. For information about how to use Databricks Runtime with Conda, see Conda.
New Features
See Databricks Runtime 6.0 New features.
Improvements
See Databricks Runtime 6.0 Improvements.
Known Issues
By default, every Python notebook runs in its own isolated Conda environment. This isolated environment is cloned from the root Conda environment. Because this clone is an expensive operation, for certain cases, you may experience the following issues:
If the cluster instance type does not have local storage, cluster creation may fail with an error like:
Could not start Spark. This can happen when installing incompatible libraries or when initialization scripts failed. databricks_error_message: Spark failed to start: Timed out after ... seconds
Concurrently attaching many Python notebooks to a single cluster (for example, triggered by scheduled jobs or notebook workflows) may cause some of those notebooks fail to attach.
If you experience any of the above issues and you do not need to run Python notebooks in isolated environments (that is, your cluster is not shared), you can disable creating an isolated Python environment for every Python notebook by setting
spark.databricks.libraryIsolation.enabled
tofalse
in Spark configuration. Setting this flag also disablesdbutils.library
.If you upgrade the installed Conda, the new version of Conda may not include the fix for Conda issue 9104 (Conda List fails if “RECORD” file has duplicate entries). If you upgrade Conda and see failures on attaching Python notebooks or using
conda list
with the errorTypeError: '<' not supported between instances of 'NoneType' and 'str'
in driver logs or on a notebook, either use a version of Conda that has the fix or avoid upgrading Conda installed in this release.
System environment
The system environment in Databricks Runtime 6.0 with Conda differs from Databricks Runtime 6.0 as follows:
There are some differences in the installed Python libraries.
Libraries
The following is the exported environment.yml
file for default root environments on Databricks Runtime 6.0 with Conda.
Databricks Standard
name: databricks-standard
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- asn1crypto=0.24.0=py37_0
- backcall=0.1.0=py37_0
- blas=1.0=openblas
- boto=2.49.0=py37_0
- boto3=1.9.162=py_0
- botocore=1.12.163=py_0
- ca-certificates=2019.1.23=0
- certifi=2019.3.9=py37_0
- cffi=1.12.2=py37h2e261b9_1
- chardet=3.0.4=py37_1003
- cryptography=2.6.1=py37h1ba5d50_0
- cython=0.29.6=py37he6710b0_0
- decorator=4.4.0=py37_1
- docutils=0.14=py37_0
- idna=2.8=py37_0
- ipython=7.4.0=py37h39e3cac_0
- ipython_genutils=0.2.0=py37_0
- jedi=0.13.3=py37_0
- jmespath=0.9.4=py_0
- krb5=1.16.1=h173b8e3_7
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=8.2.0=hdf63c60_1
- libgfortran-ng=7.3.0=hdf63c60_0
- libopenblas=0.3.6=h5a2b251_1
- libpq=11.2=h20c2e04_0
- libstdcxx-ng=8.2.0=hdf63c60_1
- ncurses=6.1=he6710b0_1
- nomkl=3.0=0
- numpy=1.16.2=py37h99e49ec_0
- numpy-base=1.16.2=py37h2f8d375_0
- openssl=1.1.1b=h7b6447c_1
- pandas=0.24.2=py37he6710b0_0
- parso=0.3.4=py37_0
- patsy=0.5.1=py37_0
- pexpect=4.6.0=py37_0
- pickleshare=0.7.5=py37_0
- pip=19.0.3=py37_0
- prompt_toolkit=2.0.9=py37_0
- psycopg2=2.7.6.1=py37h1ba5d50_0
- ptyprocess=0.6.0=py37_0
- pycparser=2.19=py37_0
- pygments=2.3.1=py37_0
- pyopenssl=19.0.0=py37_0
- pysocks=1.6.8=py37_0
- python=3.7.3=h0371630_0
- python-dateutil=2.8.0=py37_0
- pytz=2018.9=py37_0
- readline=7.0=h7b6447c_5
- requests=2.21.0=py37_0
- s3transfer=0.2.1=py37_0
- scikit-learn=0.20.3=py37h22eb022_0
- scipy=1.2.1=py37he2b7bc3_0
- setuptools=40.8.0=py37_0
- six=1.12.0=py37_0
- sqlite=3.27.2=h7b6447c_0
- statsmodels=0.9.0=py37h035aef0_0
- tk=8.6.8=hbc83047_0
- traitlets=4.3.2=py37_0
- urllib3=1.24.1=py37_0
- wcwidth=0.1.7=py37_0
- wheel=0.33.1=py37_0
- xz=5.2.4=h14c3975_4
- zlib=1.2.11=h7b6447c_3
- pip:
- cycler==0.10.0
- kiwisolver==1.1.0
- matplotlib==3.0.3
- pyarrow==0.13.0
- pyparsing==2.4.2
- seaborn==0.9.0
prefix: /databricks/conda/envs/databricks-standard
Databricks Minimal
name: databricks-minimal
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- backcall=0.1.0=py37_0
- blas=1.0=openblas
- ca-certificates=2019.1.23=0
- certifi=2019.3.9=py37_0
- decorator=4.4.0=py37_1
- ipython=7.4.0=py37h39e3cac_0
- ipython_genutils=0.2.0=py37_0
- jedi=0.13.3=py37_0
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=8.2.0=hdf63c60_1
- libgfortran-ng=7.3.0=hdf63c60_0
- libopenblas=0.3.6=h5a2b251_1
- libstdcxx-ng=8.2.0=hdf63c60_1
- ncurses=6.1=he6710b0_1
- nomkl=3.0=0
- numpy=1.16.2=py37h99e49ec_0
- numpy-base=1.16.2=py37h2f8d375_0
- openssl=1.1.1b=h7b6447c_1
- pandas=0.24.2=py37he6710b0_0
- parso=0.3.4=py37_0
- pexpect=4.6.0=py37_0
- pickleshare=0.7.5=py37_0
- pip=19.0.3=py37_0
- prompt_toolkit=2.0.9=py37_0
- ptyprocess=0.6.0=py37_0
- pygments=2.3.1=py37_0
- python=3.7.3=h0371630_0
- python-dateutil=2.8.0=py37_0
- pytz=2018.9=py37_0
- readline=7.0=h7b6447c_5
- setuptools=40.8.0=py37_0
- six=1.12.0=py37_0
- sqlite=3.27.2=h7b6447c_0
- tk=8.6.8=hbc83047_0
- traitlets=4.3.2=py37_0
- wcwidth=0.1.7=py37_0
- wheel=0.33.1=py37_0
- xz=5.2.4=h14c3975_4
- zlib=1.2.11=h7b6447c_3
- pip:
- pyarrow==0.13.0
prefix: /databricks/conda/envs/databricks-minimal