Skip to main content

Install Databricks Connect for Python

note

This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.

This article describes how to install Databricks Connect for Python. See What is Databricks Connect?.

Requirements

Before installing Databricks Connect, make sure your workspace and local environment meet the requirements. See Databricks Connect usage requirements.

Activate a Python virtual environment

Databricks strongly recommends that you have a Python virtual environment activated for each Python version that you use with Databricks Connect. Python virtual environments help to make sure that you are using the correct versions of Python and Databricks Connect together. For more information about these tools and how to activate them, see venv or Poetry.

Install the Databricks Connect client

This section describes how to install the Databricks Connect client with venv or Poetry.

note

If you already have the Databricks extension for Visual Studio Code installed, you can install Databricks Connect for Databricks Runtime 13.3 LTS and above using the extension. See Debug code using Databricks Connect for the Databricks extension for Visual Studio Code.

Install the Databricks Connect client with venv

  1. With your virtual environment activated, uninstall PySpark, if it is already installed, by running the uninstall command. This is required because the databricks-connect package conflicts with PySpark. For details, see Conflicting PySpark installations. To check whether PySpark is already installed, run the show command.

    Bash
    # Is PySpark already installed?
    pip3 show pyspark

    # Uninstall PySpark
    pip3 uninstall pyspark
  2. With your virtual environment still activated, install the Databricks Connect client by running the install command. Use the --upgrade option to upgrade any existing client installation to the specified version.

    Bash
    pip3 install --upgrade "databricks-connect==16.4.*"  # Or X.Y.* to match your cluster version.
    note

    Databricks recommends that you append the “dot-asterisk” notation to specify databricks-connect==X.Y.* instead of databricks-connect=X.Y, to make sure that the most recent package is installed. While this is not a requirement, it helps make sure that you can use the latest supported features for that cluster.

Install the Databricks Connect client with Poetry

  1. With your virtual environment activated, uninstall PySpark, if it is already installed, by running the remove command. This is required because the databricks-connect package conflicts with PySpark. For details, see Conflicting PySpark installations. To check whether PySpark is already installed, run the show command.

    Bash
    # Is PySpark already installed?
    poetry show pyspark

    # Uninstall PySpark
    poetry remove pyspark
  2. With your virtual environment still activated, install the Databricks Connect client by running the add command.

    Bash
    poetry add databricks-connect@~16.4  # Or X.Y to match your cluster version.
    note

    Databricks recommends that you use the “at-tilde” notation to specify databricks-connect@~16.4 instead of databricks-connect==16.4, to make sure that the most recent package is installed. While this is not a requirement, it helps make sure that you can use the latest supported features for that cluster.