Use notebooks with Databricks Connect
Note
This feature works with Databricks Runtime 13.3 and above.
You can run Databricks notebooks and see their results in the Visual Studio Code IDE, one cell at a time or all cells at once, by using the Databricks Connect integration in the Databricks extension for Visual Studio Code. All code runs locally, while all code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller. You can also debug cells. All code is debugged locally, while all Spark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.
By default, without the Databricks Connect integration that is described in this article, notebook usage is limited:
You cannot run notebooks one cell at a time by using just the Databricks extension for Visual Studio Code.
You cannot debug cells.
You can run notebooks only as Databricks jobs and see only the notebooks’ run results in the Visual Studio Code IDE.
All notebook code runs only on the clusters that are associated with these jobs.
To enable the Databricks Connect integration for notebooks in the Databricks extension for Visual Studio Code, you must enable the Databricks Connect integration in the Databricks extension for Visual Studio Code. See Debug code by using Databricks Connect for the Databricks extension for Visual Studio Code.
After enablement, for notebooks with filenames that have a .py
extension, when you open the notebook in the Visual Studio Code IDE, each cell displays Run Cell, Run Above, and Debug Cell buttons. As you run a cell, its results are shown in a separate tab in the IDE. As you debug, the cell being debugged displays Continue, Stop, and Step Over buttons. As you debug a cell, you can use Visual Studio Code debugging features such as watching variables’ states and viewing the call stack and debug console.
After enablement, for notebooks with filenames that have a .ipynb
extension, when you open the notebook in the Visual Studio Code IDE, the notebook and its cells contain additional features. See Running cells and Work with code cells in the Notebook Editor.
For more information about notebook formats for filenames with the .py
and .ipynb
extensions, see Export and import Databricks notebooks.
The following notebook globals are also enabled:
spark
, representing an instance ofdatabricks.connect.DatabricksSession
, is preconfigured to instantiateDatabricksSession
by getting Databricks authentication credentials from the extension. IfDatabricksSession
is already instantiated in a notebook cell’s code, thisDatabricksSession
settings are used instead. See Code examples for Databricks Connect for Python.udf
, preconfigured as an alias forpyspark.sql.functions.udf
, which is an alias for Python UDFs. See pyspark.sql.functions.udf.sql
, preconfigured as an alias forspark.sql
.spark
, as described earlier, represents a preconfigured instance ofdatabricks.connect.DatabricksSession
. See Spark SQL.dbutils
, preconfigured as an instance of Databricks Utilities, which is imported fromdatabricks-sdk
and is instantiated by getting Databricks authentication credentials from the extension. See Use Databricks Utilities.Note
Only a subset of Databricks Utilities is supported for notebooks with Databricks Connect.
To enable
dbutils.widgets
, you must first install the Databricks SDK for Python by running the following command in the your local development machine’s terminal:pip install 'databricks-sdk[notebook]'
display
, preconfigured as an alias for the Jupyter builtinIPython.display.display
. See IPython.display.display.displayHTML
, preconfigured as an alias fordbruntime.display.displayHTML
, which is an alias fordisplay.HTML
fromipython
. See IPython.display.html.
The following notebook magics are also enabled:
%fs
, which is the same as makingdbutils.fs
calls. See Mix languages.%sh
, which runs a command by using the cell magic%%script
on the local machine. This does not run the command in the remote Databricks workspace. See Mix languages.%md
and%md-sandbox
, which runs the cell magic%%markdown
. See Mix languages.%sql
, which runsspark.sql
. See Mix languages.%pip
, which runspip install
on the local machine. This does not runpip install
in the remote Databricks workspace. See Manage libraries with %pip commands.%run
, which runs another notebook. This notebook magic is available in Databricks extension for Visual Studio Code version 1.1.2 and above. See Run a Databricks notebook from another notebook.Note
To enable
%run
, you must first install the nbformat library by running the following command in your local development machine’s terminal:pip install nbformat
# MAGIC
. This notebook magic is available in Databricks extension for Visual Studio Code version 1.1.2 and above.
Additional features that are enabled include:
Spark DataFrames are converted to pandas DataFrames, which are displayed in Jupyter table format.
Limitations include:
The notebooks magics
%r
and%scala
are not supported and display an error if called. See Mix languages.The notebook magic
%sql
does not support some DML commands, such as Show Tables.