This article covers Databricks Connect for Databricks Runtime 13.0 and above.
This article covers how to use Databricks Connect for Python with JupyterLab. Databricks Connect enables you to connect popular notebook servers, IDEs, and other custom applications to Databricks clusters. See What is Databricks Connect?.
Before you begin to use Databricks Connect, you must set up the Databricks Connect client.
To use Databricks Connect with JupyterLab and Python, follow these instructions.
To install JupyterLab, with your Python virtual environment activated, run the following command from your terminal or Command Prompt:
pip3 install jupyterlab
To start JupyterLab in your web browser, run the following command from your activated Python virtual environment:
If JupyterLab does not appear in your web browser, copy the URL that starts with
127.0.0.1from your virtual environment, and enter it in your web browser’s address bar.
Create a new notebook: in JupyterLab, click File > New > Notebook on the main menu, select Python 3 (ipykernel) and click Select.
To run the notebook, click Run > Run All Cells. All code runs locally, while all code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.
To debug the notebook, click the bug (Enable Debugger) icon next to Python 3 (ipykernel) in the notebook’s toolbar. Set one or more breakpoints, and then click Run > Run All Cells. All code is debugged locally, while all Spark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.
To shut down JupyterLab, click File > Shut Down. If the JupyterLab process is still running in your terminal or Command Prompt, stop this process by pressing
Ctrl + cand then entering
For more specific debug instructions, see Debugger.