Spark jobs, Python notebook cells, and library installation all support both Python 2 and 3.
Python 3 is supported on all Databricks Runtime versions starting with Spark 2.0.2-db3.
The default Python version for clusters created using the UI is Python 3. The default version for clusters created using the REST API is Python 2.
To specify the Python version when you create a cluster, select it from the Python Version drop-down.
You can create a cluster running a specific version of Python using the API by setting the environment variable
/databricks/python3/bin/python3. For an example, see the REST API example Create a Python 3 cluster.
To validate that the
PYSPARK_PYTHON configuration took effect, in a Python notebook (or
%python cell) run
import sys print(sys.version)
If you specified
/databricks/python3/bin/python3, it should print something like:
3.5.2 (default, Sep 10 2016, 08:21:44) [GCC 5.4.0 20160609]
When you run
%sh python --version in a notebook,
python refers to the Ubuntu system Python version, which is Python 2. Use
/databricks/python/bin/python to refer to the version of Python used by Databricks notebooks and Spark: this path is automatically configured to point to the correct Python executable.
- Can I use both Python 2 and Python 3 notebooks on the same cluster?
- No. The Python version is a cluster-wide setting and is not configurable on a per-notebook basis.
- What libraries are pre-installed on Python clusters?
- Python 2 and 3 share the same set of installed libraries and library versions with only one exception:
simples3is not available for Python 3, so it is installed only in Python 2. For details on the specific libraries that are pre-installed, see the Databricks Runtime release notes.
- Will my existing PyPI libraries work with Python 3?
- Yes. Databricks installs the correct version if the library supports both Python 2 and 3. If the library does not support Python 3, then library attachment fails with an error.
- Will my existing
.egglibraries work with Python 3?
It depends on whether your existing egg library is cross-compatible with both Python 2 and 3. If the library does not support Python 3 then either library attachment will fail or runtime errors will occur.
For a comprehensive guide on porting code to Python 3 and writing code compatible with both Python 2 and 3, see http://python3porting.com/.
- Can I still install Python libraries using init scripts?
- A common use case for Cluster Node Initialization Scripts is to install packages. Use
/databricks/python/bin/pipto ensure that Python packages install into Databricks Python virtual environment rather than the system Python environment.