Install Databricks Connect for Python
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
This article describes how to install Databricks Connect for Python. See What is Databricks Connect?.
Requirements
Before installing Databricks Connect, make sure your workspace and local environment meet the requirements. See Databricks Connect usage requirements.
Activate a Python virtual environment
Databricks strongly recommends that you have a Python virtual environment activated for each Python version that you use with Databricks Connect. Python virtual environments help to make sure that you are using the correct versions of Python and Databricks Connect together. For more information about these tools and how to activate them, see venv or Poetry.
Install the Databricks Connect client
This section describes how to install the Databricks Connect client with venv or Poetry.
If you already have the Databricks extension for Visual Studio Code installed, you can install Databricks Connect for Databricks Runtime 13.3 LTS and above using the extension. See Debug code using Databricks Connect for the Databricks extension for Visual Studio Code.
Install the Databricks Connect client with venv
-
With your virtual environment activated, uninstall PySpark, if it is already installed, by running the
uninstall
command. This is required because thedatabricks-connect
package conflicts with PySpark. For details, see Conflicting PySpark installations. To check whether PySpark is already installed, run theshow
command.Bash# Is PySpark already installed?
pip3 show pyspark
# Uninstall PySpark
pip3 uninstall pyspark -
With your virtual environment still activated, install the Databricks Connect client by running the
install
command. Use the--upgrade
option to upgrade any existing client installation to the specified version.Bashpip3 install --upgrade "databricks-connect==16.4.*" # Or X.Y.* to match your cluster version.
noteDatabricks recommends that you append the “dot-asterisk” notation to specify
databricks-connect==X.Y.*
instead ofdatabricks-connect=X.Y
, to make sure that the most recent package is installed. While this is not a requirement, it helps make sure that you can use the latest supported features for that cluster.
Install the Databricks Connect client with Poetry
-
With your virtual environment activated, uninstall PySpark, if it is already installed, by running the
remove
command. This is required because thedatabricks-connect
package conflicts with PySpark. For details, see Conflicting PySpark installations. To check whether PySpark is already installed, run theshow
command.Bash# Is PySpark already installed?
poetry show pyspark
# Uninstall PySpark
poetry remove pyspark -
With your virtual environment still activated, install the Databricks Connect client by running the
add
command.Bashpoetry add databricks-connect@~16.4 # Or X.Y to match your cluster version.
noteDatabricks recommends that you use the “at-tilde” notation to specify
databricks-connect@~16.4
instead ofdatabricks-connect==16.4
, to make sure that the most recent package is installed. While this is not a requirement, it helps make sure that you can use the latest supported features for that cluster.