Use Visual Studio Code with Databricks Connect for Python

Note

This article covers Databricks Connect for Databricks Runtime 13.0 and above.

This article covers how to use Databricks Connect for Python with Visual Studio Code. Databricks Connect enables you to connect popular IDEs, notebook servers, and other custom applications to Databricks clusters. See What is Databricks Connect?. For the Scala version of this article, see Use Visual Studio Code with Databricks Connect for Scala.

Note

Before you begin to use Databricks Connect, you must set up the Databricks Connect client.

Tip

The Databricks extension for Visual Studio Code already has built-in support for Databricks Connect for Databricks Runtime 13.0 and above. See Debug code by using Databricks Connect for the Databricks extension for Visual Studio Code.

To use Databricks Connect with Visual Studio Code and Python, follow these instructions.

  1. Start Visual Studio Code.

  2. Open the folder that contains your Python virtual environment (File > Open Folder).

  3. In the Visual Studio Code Terminal (View > Terminal), activate the virtual environment.

  4. Set the current Python interpreter to be the one that is referenced from the virtual environment:

    1. On the Command Palette (View > Command Palette), type Python: Select Interpreter, and then press Enter.

    2. Select the path to the Python interpreter that is referenced from the virtual environment.

  5. Add to the folder a Python code (.py) file that contains either the example code or your own code. If you use your own code, at minimum you must initialize DatabricksSession as shown in the example code.

  6. To run the code, click Run > Run Without Debugging on the main menu. All Python code runs locally, while all PySpark code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.

  7. To debug the code:

    1. With the Python code file open, set any breakpoints where you want your code to pause while running.

    2. Click the Run and Debug icon on the sidebar, or click View > Run on the main menu.

    3. In the Run and Debug view, click the Run and Debug button.

    4. Follow the on-screen instructions to start running and debugging the code.

    All Python code is debugged locally, while all PySpark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.

For more specific run and debug instructions, see Configure and run the debugger and Python debugging in VS Code.