Migrate to Databricks Connect for Python

This article describes how to migrate from Databricks Connect for Databricks Runtime 12.2 LTS and below to Databricks Connect for Databricks Runtime 13.3 LTS and above for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Databricks clusters. See What is Databricks Connect?.

Before you begin to use Databricks Connect, you must set up the Databricks Connect client.

For the Scala version of this article, see Migrate to Databricks Connect for Scala.

Migrate your Python project

To migrate your existing Python code project or coding environment from Databricks Connect for Databricks Runtime 12.2 LTS and below to Databricks Connect for Databricks Runtime 13.3 LTS and above:

Install the correct version of Python as listed in the installation requirements to match your Databricks cluster, if it is not already installed locally.
Upgrade your Python virtual environment to use the correct version of Python to match your cluster, if needed. For instructions, see your virtual environment provider's documentation.
With your virtual environment activated, uninstall PySpark from your virtual environment:
Bash
```
pip3 uninstall pyspark
```
With your virtual environment still activated, uninstall Databricks Connect for Databricks Runtime 12.2 LTS and below:
Bash
```
pip3 uninstall databricks-connect
```
With your virtual environment still activated, install Databricks Connect for Databricks Runtime 13.3 LTS and above:
Bash
```
pip3 install --upgrade "databricks-connect==14.0.*"  # Or X.Y.* to match your cluster version.
```
note
Databricks recommends that you append the “dot-asterisk” notation to specify databricks-connect==X.Y.* instead of databricks-connect=X.Y, to make sure that the most recent package is installed. While this is not a requirement, it helps make sure that you can use the latest supported features for that cluster.
Update your Python code to initialize the spark variable (which represents an instantiation of the DatabricksSession class, similar to SparkSession in PySpark). See Compute configuration for Databricks Connect.
Migrate your RDD APIs to use DataFrame APIs, and migrate your SparkContext to use alternatives.

Set Hadoop configurations

On the client you can set Hadoop configurations using the spark.conf.set API, which applies to SQL and DataFrame operations. Hadoop configurations set on the sparkContext must be set in the cluster configuration or using a notebook. This is because configurations set on sparkContext are not tied to user sessions but apply to the entire cluster.

Migrate your Python project​

Set Hadoop configurations​

Migrate your Python project

Set Hadoop configurations