Notebooks

You can create and manage notebooks using the UI, the CLI, and by invoking the Workspace API. This topic focuses on performing notebook tasks using the UI. For the other methods, see Databricks CLI and Workspace API.

Notebooks are one interface for interacting with Databricks. If you have enabled the Databricks Operational Security Package, you can use Workspace access control to control sharing of notebooks and folders in the workspace.

Create a notebook

  1. Click the Workspace button Workspace Icon or the Home button Home Icon in the sidebar. Do one of the following:

    • Next to any folder, click the Menu Dropdown on the right side of the text and select Create > Notebook.

      Create Notebook
    • In the Workspace or a user folder, select Down Caret Create > Notebook.

  2. In the Create Notebook dialog, enter a name and select the notebook’s primary language. Notebooks support Python, Scala, SQL, and R as their primary language.

  3. Click Create.

Import a notebook

You can import a notebook from a URL or a file.

  1. Click the Workspace button Workspace Icon or the Home button Home Icon in the sidebar. Do one of the following:

    • Next to any folder, click the Menu Dropdown on the right side of the text and select Import.

    • In the Workspace or a user folder select Down Caret Import.

      ../../_images/import-notebook.png
  2. Click Import.

Attach a notebook to a cluster

Before you can do any work in a notebook, you must first attach the notebook to a cluster. In the notebook toolbar, click Detached under the notebook’s name at the top left. From the dropdown, select a running cluster.

Use a notebook

Add a cell

To add a cell, mouse over a cell top or bottom and click the Add Cell icon or access the notebook cell menu at the far right by and click Down Caret > Add Cell Above or Down Caret > Add Cell Below.

Tip

Keyboard shortcuts shortcuts make it much easier to use notebooks and execute code. Toggle the shortcut display by clicking the shortcuts link under a cell, the Keyboard Icon icon at the top, or at the top right under the ?.

../../_images/short-cuts.png

Predefined variables

Notebooks have some Apache Spark variables already defined.

Class Variable Name
SparkContext sc
SQLContext/HiveContext sqlContext
SparkSession (Spark 2.x) spark

Important

Do not create a SparkSession, SparkContext, or SQLContext. Doing so will lead to inconsistent behavior.

Run a cell

To run code, type the code in a cell and either select Run Icon > Run Cell or press shift+Enter. For example, try executing these Python code snippets.

# A Spark Context is already created for you.
# Do not create another or unspecified behavior may occur.
spark
# A SQLContext is also already created for you.
# Do not create another or unspecified behavior may occur.
# As you can see below, the sqlContext provided is a HiveContext.
sqlContext
# A Spark Context is already created for you.
# Do not create another or unspecified behavior may occur.
sc

Now that you’ve seen the pre-defined variables, run some real code!

1+1 # => 2

Run all cells

To run all the cells in a notebook, select Run All in the notebook toolbar.

Important

Do not do a Run All if steps for mount and unmount are in the same notebook. It could lead to a race condition and possibly corrupt the mount points.

Mix languages

While a notebook has a primary language, you can mix languages by specifying the language magic command %<language> at the beginning of a cell. %<language> allows you to execute <language> code even if that notebook’s primary language is not <language>. The supported magic commands are: %python, %r, %scala, and %sql. Additionally:

%sh
Allows you to execute shell code in your notebook. Add the -e option in order to fail this cell (and subsequently a job or a run all command) if the shell command does not success. By default, %sh alone will not fail a job even if the %sh command does not completely succeed. Only %sh -e will fail if the shell command has a non-zero exit status.
%fs
Allows you to use Databricks Utilities filesystem commands. Read more on the Databricks File System - DBFS pages.

Include documentation

To include documentation in a notebook you can use the %md magic command to identify Markdown markup. The included Markdown markup is rendered into HTML. For example, this Markdown snippet:

%md # Hello This is a Title

is rendered as a HTML title:

../../_images/title.png

You can link to other notebooks or folders in Markdown cells using relative paths. Specify the href attribute of an anchor tag as the relative path, starting with a $ and then following the same pattern as in Unix file systems:

%md
<a href="$./myNotebook">Link to notebook in same folder as current notebook</a>
<a href="$../myFolder">Link to folder in parent folder of current notebook</a>
<a href="$./myFolder2/myNotebook2">Link to nested notebook</a>

Include HTML

You can include HTML in a notebook by using the function displayHTML. See HTML, D3, and SVG in Notebooks for an example of how to do this.

Show line and command numbers

To show line numbers or command numbers, click View > Show line numbers or View > show command numbers. Once they’re displayed, you can hide them again from the same menu. You can also enable line numbers with the keyboard shortcut Control+L.

Show line or command numbers via the view menu
Line and command numbers enabled in notebook

If you enable line or command numbers, Databricks saves your preference and will show them in all of your other notebooks for that browser.

Command numbers above cells link to that specific command. If you click on the command number for a cell, it updates your URL to be anchored to that command. If you want to link to a specific command in your notebook, right-click the command number and choose copy link address.

Python and Scala error highlighting

Python and Scala notebooks support error highlighting. That is, the line of code that is throwing the error will be highlighted in the cell. Additionally, if the error output is a stacktrace, the cell in which the error is thrown is displayed in the stacktrace as a link to the cell. You can click this link to jump to the offending code.

../../_images/notebook-python-error-highlighting.png ../../_images/notebook-scala-error-highlighting.png

Find and replace text

To find and replace text within a notebook, select File > Find and Replace.

../../_images/find-replace-in-dropdown.png

The current match is highlighted in orange and all other matches are highlighted in yellow.

../../_images/find-replace-example.png

You can replace matches on an individual basis by clicking Replace.

You can switch between matches by clicking the Prev and Next buttons or pressing shift+enter and enter to go to the previous and next matches, respectively.

Close the find and replace tool by clicking the x button or pressing esc.

Download results

Once you’ve run your code, you may want to download those results to your local machine. To do so click the Download Results button at the bottom of a cell that contains tabular output. You’ll see an option to download the preview of the results or the full results.

You can try this out by running

%sql SELECT 1

and downloading the results.

Run a notebook from another notebook

You can run a notebook from another notebook by using the %run magic command. This is roughly equivalent to a :load command in a Scala REPL on your local machine or an import statement in Python. All variables defined in that other notebook become available in your current notebook.

For example, suppose you have notebookA and notebookB. notebookA contains a cell that has the following Python code:

x = 5

Running this code snippet in notebookB works even though x was never explicitly created.

%run /Users/path/to/notebookA

print(x) # => 5

To specify a relative path, preface it with ./ or ../. For example, if notebookA and notebookB are in the same directory you can alternatively run them from a relative path.

%run ./notebookA

print(x) # => 5
%run ../someDirectory/notebookA # up a directory and into another

print(x) # => 5

Note

%run must be in a cell by itself as it runs the entire notebook inline.

Export a notebook

You export a notebook from Databricks by selecting File > Export.

Publish a notebook

If you’re on Community Edition, you can publish a notebook so that you can share a URL path to the notebook. Subsequent publish actions update the notebook at that URL.

Notifications

Notifications alert you to certain events, such as which command is currently running during Run all cells and which commands are in error state. When your notebook is showing multiple error notifications, the first one will have a link that allows you to clear all notifications.

Notebook notifications are enabled by default. You can disable them under User Settings > Notebook Settings.

Notebook isolation

Databricks supports two types of isolation: variable and class and Spark session.

Note

Since all notebooks attached to the same cluster execute on the same cluster VMs, even with Spark session isolation enabled there is no guaranteed user isolation within a cluster.

Variable and class isolation

Variables and classes are available only in the current notebook. For example, two notebooks attached to the same cluster can define variables and classes with the same name but these objects are distinct.

To define a class that is visible to all notebooks attached to the same cluster, define the class in a package cell. Then, you can access the class by using its fully qualified name, which is the same as accessing a class in an attached Scala or Java library.

Spark session isolation

For a cluster running Apache Spark 2.0.0 and above, every notebook has a pre-defined variable called spark representing a SparkSession. A SparkSession is the entry point for using different APIs in Spark as well as setting different runtime configurations.

For Spark 2.0.0 and Spark 2.0.1-db1, notebooks attached to a cluster share the same SparkSession. From Spark 2.0.2-db1, you can enable Spark session isolation so that every notebook uses its own SparkSession. When Spark session isolation is enabled:

  • Runtime configurations set using spark.conf.set or using SQL’s set command affect only the current notebook. Configurations for a metastore connection are not runtime configurations and all notebooks attached to a cluster share these configurations.
  • Setting the current database affects only the current notebook.
  • Temporary views created by dataset.createTempView, dataset.createOrReplaceTempView, and SQL’s CREATE TEMPORARY VIEW command are visible only in the current notebook.

To enable Spark session isolation, set spark.databricks.session.share to false in the Spark Config field.

In Spark 2.0.2-db<x> by default session isolation is disabled.

Spark 2.1 and above have session isolation enabled by default. In addition, from Spark 2.1, you can use global temporary views to share temporary views across notebooks.

Cells that trigger commands in other languages (that is, cells using %scala, %python, %r, and %sql) and cells that include other notebooks (that is, cells using %run) are part of the current notebook. Thus, these cells are in the same session as other notebook cells. In contrast, Notebook Workflows run a notebook with an isolated SparkSession, which means temporary views defined in such a notebook are not visible in other notebooks.

Autocomplete

Databricks supports two types of autocomplete in your notebook: local and server.

Local autocomplete completes words that exist in the notebook. Server autocomplete is more powerful because it accesses the cluster for defined types, classes, and objects and SQL database and table names. To activate server autocomplete you must attach your notebook to a running cluster and run all cells that define completable objects.

Important

  • Server autocomplete in Scala, Python, and R notebooks is blocked during command execution.
  • Server autocomplete is not available for serverless clusters.

You trigger autocomplete by pressing Tab after entering a completable object. For example, after you define and run the cells containing the definitions of MyClass and instance, the methods of instance are completable and list of valid completions displays when you press Tab.

../../_images/notebook-autocomplete-object.png

Type completion and SQL database and table name completion work in the same way.

Type Completion — — SQL Completion

Delete a notebook

Since notebooks are contained inside the Workspace (and in folders in the Workspace), they follow the same rules as folders. See the Access the Workspace menu for information about how to access the Workspace menu and delete notebooks or other items in the Workspace.

Databricks archives

A Databricks archive is a package that lets you distribute collections of notebooks. An Databricks archive is a JAR file with extra metadata and has the extension .dbc.

Export an archive

Select the Down Caret or Menu Dropdown to the right of a folder or notebook and select Export > DBC Archive. Databricks downloads a file named <[folder|notebook]-name>.dbc.

Import an archive

  1. Select the Down Caret or Menu Dropdown to the right of a folder or notebook and select Import.
  2. Choose File or URL.
  3. Go to or drop a Databricks archive in the dropzone.
  4. Click Import. The archive is imported into Databricks. If the archive contains an exported folder, Databricks
recreates that folder.

Extract an archive

To extract the notebooks from an archive, run:

jar xvf <archive>.dbc

Version control

Databricks has basic version control for notebooks. To access version control, click the Revision History menu on the top right of every notebook. You can specify revisions with comments and those will be permanently saved. Databricks also integrates with these third-party version control tools: