This article covers Databricks Connect for Databricks Runtime 13.0 and above.
This article covers how to use Databricks Connect for Scala and Eclipse with PyDev. Databricks Connect enables you to connect popular IDEs, notebook servers, and other custom applications to Databricks clusters. See What is Databricks Connect?.
Before you begin to use Databricks Connect, you must set up the Databricks Connect client.
To use Databricks Connect and Eclipse with PyDev, follow these instructions.
Create a project: click File > New > Project > PyDev > PyDev Project, and then click Next.
Specify a Project name.
For Project contents, specify the path to your Python virtual environment.
Click Please configure an interpreter before proceding.
Click Manual config.
Click New > Browse for python/pypy exe.
Browse to and select select the full path to the Python interpreter that is referenced from the virtual environment, and then click Open.
In the Select interpreter dialog, click OK.
In the Selection needed dialog, click OK.
In the Preferences dialog, click Apply and Close.
In the PyDev Project dialog, click Finish.
Click Open Perspective.
Add to the project a Python code (
.py) file that contains either the example code or your own code. If you use your own code, at minimum you must initialize
DatabricksSessionas shown in the example code.
With the Python code file open, set any breakpoints where you want your code to pause while running.
To run the code, click Run > Run. All Python code runs locally, while all PySpark code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.
To debug the code, click Run > Debug. All Python code is debugged locally, while all PySpark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.
For more specific run and debug instructions, see Running a Program.