Spark jobs, Python notebook cells, and library installation all support both Python 2 and 3.
Python 3 is supported on all Databricks Runtime versions starting with Spark 2.0.2-db3.
To specify the Python version when you create a cluster, select it from the Python Version drop-down. When you create a new cluster, the default Python version is Python 3.
You can create a cluster running a specific version of Python using the API by setting the environment variable
/databricks/python3/bin/python3. For an example, see the REST API example Create a Python 3 cluster.
To validate that the
PYSPARK_PYTHON configuration took effect, in a Python notebook (or
%python cell) run
import sys print(sys.version)
If you specified
/databricks/python3/bin/python3, it should print something like:
3.5.2 (default, Sep 10 2016, 08:21:44) [GCC 5.4.0 20160609]
When you run
%sh python --version in a notebook,
python refers to the Ubuntu system Python version, which is Python 2. Use
/databricks/python/bin/python to refer to the version of Python used by Databricks notebooks and Spark: this path is automatically configured to point to the correct Python executable.
- Can I use both Python 2 and Python 3 notebooks on the same cluster?
- No. The Python version is a cluster-wide setting and is not configurable on a per-notebook basis.
- What libraries are pre-installed on Python clusters?
- Python 2 and 3 share the same set of installed libraries and library versions with only one exception:
simples3is not available for Python 3, so it is installed only in Python 2. For details on the specific libraries that are pre-installed, see the Databricks Runtime release notes.
- Will my existing PyPI libraries work with Python 3?
- Yes. Databricks installs the correct version if the library supports both Python 2 and 3. If the library does not support Python 3, then library attachment fails with an error.
- Will my existing
.egglibraries work with Python 3?
It depends on whether your existing egg library is cross-compatible with both Python 2 and 3. If the library does not support Python 3 then either library attachment will fail or runtime errors will occur.
For a comprehensive guide on porting code to Python 3 and writing code compatible with both Python 2 and 3, see http://python3porting.com/.
- Can I still install Python libraries using init scripts?
- A common use case for Cluster Node Initialization Scripts is to install packages. Use
/databricks/python/bin/pipto ensure that Python packages install into Databricks Python virtual environment rather than the system Python environment.