This article covers Databricks Connect for Databricks Runtime 13.0 and above.
This article covers how to use Databricks Connect for Python with classic Jupyter Notebook. Databricks Connect enables you to connect popular notebook servers, IDEs, and other custom applications to Databricks clusters. See What is Databricks Connect?.
Before you begin to use Databricks Connect, you must set up the Databricks Connect client.
To use Databricks Connect with classic Jupyter Notebook and Python, follow these instructions.
To install classic Jupyter Notebook, with your Python virtual environment activated, run the following command from your terminal or Command Prompt:
pip3 install notebook
To start classic Jupyter Notebook in your web browser, run the following command from your activated Python virtual environment:
If classic Jupyter Notebook does not appear in your web browser, copy the URL that starts with
127.0.0.1from your virtual environment, and enter it in your web browser’s address bar.
Create a new notebook: in classic Jupyter Notebook, on the Files tab, click New > Python 3 (ipykernel).
To run the notebook, click Cell > Run All. All Python code runs locally, while all PySpark code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.
To debug the notebook, add the following line of code at the beginning of your notebook:
from IPython.core.debugger import set_trace
And then call
set_trace()to enter debug statements at that point of notebook execution. All Python code is debugged locally, while all PySpark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.
To shut down classic Jupyter Notebook, click File > Close and Halt. If the classic Jupyter Notebook process is still running in your terminal or Command Prompt, stop this process by pressing
Ctrl + cand then entering