Use PyCharm with Databricks Connect for Python

Note

This article covers Databricks Connect for Databricks Runtime 13.0 and above.

This article covers how to use Databricks Connect for Python with PyCharm. Databricks Connect enables you to connect popular IDEs, notebook servers, and other custom applications to Databricks clusters. See What is Databricks Connect?.

Note

Before you begin to use Databricks Connect, you must set up the Databricks Connect client.

IntelliJ IDEA Ultimate provides plugin support for PyCharm with Python also. For details, see Python plug-in for IntelliJ IDEA Ultimate.

To use Databricks Connect with PyCharm and Python, follow these instructions for venv or Poetry.

Use PyCharm with venv and Databricks Connect for Python

  1. Start PyCharm.

  2. Create a project: click File > New Project.

  3. For Location, click the folder icon, and then select the path to the existing venv virtual environment that you created in Install Databricks Connect for Python.

  4. Select Previously configured interpreter.

  5. For Interpreter, click the ellipses.

  6. Click System Interpreter.

  7. For Interpreter, click the ellipses, and select the full path to the Python interpreter that is installed in the existing venv virtual environment. Then click OK.

    Tip

    The Python interpreter for a venv virtual environment is typically installed in </path-to-venv>/bin/python. For more information, see venv.

  8. Click OK again.

  9. Click Create.

  10. Click Create from Existing Sources.

  11. Add to the project a Python code (.py) file that contains either the example code or your own code. If you use your own code, at minimum you must initialize DatabricksSession as shown in the example code.

  12. With the Python code file open, set any breakpoints where you want your code to pause while running.

  13. To run the code, click Run > Run. All Python code runs locally, while all PySpark code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.

  14. To debug the code, click Run > Debug. All Python code is debugged locally, while all PySpark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.

  15. Follow the on-screen instructions to start running or debugging the code.

For more specific run and debug instructions, see Run without any previous configuring and Debug.

Use PyCharm with Poetry and Databricks Connect for Python

  1. Start PyCharm.

  2. Create a project: click File > New Project.

  3. For Location, click the folder icon, and then select the path to the existing Poetry virtual environment that you created in Install Databricks Connect for Python.

  4. Select Previously configured interpreter.

  5. For Interpreter, click the ellipses.

  6. Click Poetry environment.

  7. For Interpreter, click the ellipses, and select the full path to the system version of the Python interpreter that is referenced from the existing Poetry virtual environment. Then click OK.

    Tip

    Be sure to select the path to the Python interpreter. Do not select the path to the Poetry executable.

    For information about where the system version of the Python interpreter is installed, see How to Add Python to PATH.

  8. Click OK again.

  9. Click Create.

  10. Click Create from Existing Sources.

  11. Add to the project a Python code (.py) file that contains either the example code or your own code. If you use your own code, at minimum you must initialize DatabricksSession as shown in the example code.

  12. With the Python code file open, set any breakpoints where you want your code to pause while running.

  13. To run the code, click Run > Run. All Python code runs locally, while all PySpark code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.

  14. To debug the code, click Run > Debug. All Python code is debugged locally, while all PySpark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.

  15. Follow the on-screen instructions to start running or debugging the code.

For more specific run and debug instructions, see Run without any previous configuring and Debug.