Manage notebooks

You can manage notebooks using the UI, the CLI, and by invoking the Workspace API. This article focuses on performing notebook tasks using the UI. For the other methods, see Databricks CLI and Workspace API 2.0.

Create a notebook

Use the Create button

The easiest way to create a new notebook in your default folder is to use the Create button:

  1. Click Create Icon Create in the sidebar and select Notebook from the menu. The Create Notebook dialog appears.
  2. Enter a name and select the notebook’s default language.
  3. If there are running clusters, the Cluster drop-down displays. Select the cluster you want to attach the notebook to.
  4. Click Create.

Create a notebook in any folder

You can create a new notebook in any folder (for example, in the Shared folder) following these steps:

  1. In the sidebar, click Workspace Icon Workspace. Do one of the following:

    • Next to any folder, click the Menu Dropdown on the right side of the text and select Create > Notebook.

      Create notebook
    • In the workspace or a user folder, click Down Caret and select Create > Notebook.

  2. Follow steps 2 through 4 in Use the Create button.

Open a notebook

In your workspace, click a Notebook Icon. The notebook path displays when you hover over the notebook title.

Delete a notebook

See Folders and Workspace object operations for information about how to access the workspace menu and delete notebooks or other items in the workspace.

Copy notebook path

To copy a notebook file path without opening the notebook, right-click the notebook name or click the Menu Dropdown to the right of the notebook name and select Copy File Path.

Copy notebook path

Rename a notebook

To change the title of an open notebook, click the title and edit inline or click File > Rename.

Control access to a notebook

If your Databricks account has the Premium plan (or, for customers who subscribed to Databricks before March 3, 2020, the Operational Security package), you can use Workspace access control to control who has access to a notebook.

Notebook external formats

Databricks supports several notebook external formats:

  • Source file: A file containing only source code statements with the extension .scala, .py, .sql, or .r.
  • HTML: A Databricks notebook with the extension .html.
  • DBC archive: A Databricks archive.
  • IPython notebook: A Jupyter notebook with the extension .ipynb.
  • RMarkdown: An R Markdown document with the extension .Rmd.

Import a notebook

You can import an external notebook from a URL or a file. You can also import a ZIP archive of notebooks exported in bulk from a Databricks workspace.

  1. Click Workspace Icon Workspace in the sidebar. Do one of the following:

    • Next to any folder, click the Menu Dropdown on the right side of the text and select Import.

    • In the Workspace or a user folder, click Down Caret and select Import.

      Import notebook
  2. Specify the URL or browse to a file containing a supported external format or a ZIP archive of notebooks exported from a Databricks workspace.

  3. Click Import.

    • If you choose a single notebook, it is exported in the current folder.
    • If you choose a DBC or ZIP archive, its folder structure is recreated in the current folder and each notebook is imported.

Export a notebook

In the notebook toolbar, select File > Export and a format.

Note

When you export a notebook as HTML, IPython notebook, or archive (DBC), and you have not cleared the results, the results of running the notebook are included.

Export all notebooks in a folder

Note

When you export a notebook as HTML, IPython notebook, or archive (DBC), and you have not cleared the results, the results of running the notebook are included.

To export all folders in a workspace folder as a ZIP archive:

  1. Click Workspace Icon Workspace in the sidebar. Do one of the following:
    • Next to any folder, click the Menu Dropdown on the right side of the text and select Export.
    • In the Workspace or a user folder, click Down Caret and select Export.
  2. Select the export format:
    • DBC Archive: Export a Databricks archive, a binary format that includes metadata and notebook command results.
    • Source File: Export a ZIP archive of notebook source files, which can be imported into a Databricks workspace, used in a CI/CD pipeline, or viewed as source files in each notebook’s default language. Notebook command results are not included.
    • HTML Archive: Export a ZIP archive of HTML files. Each notebook’s HTML file can be imported into a Databricks workspace or viewed as HTML. Notebook command results are included.

Publish a notebook

If you’re using Community Edition, you can publish a notebook so that you can share a URL path to the notebook. Subsequent publish actions update the notebook at that URL.

Notebooks and clusters

Before you can do any work in a notebook, you must first attach the notebook to a cluster. This section describes how to attach and detach notebooks to and from clusters and what happens behind the scenes when you perform these actions.

Execution contexts

When you attach a notebook to a cluster, Databricks creates an execution context. An execution context contains the state for a REPL environment for each supported programming language: Python, R, Scala, and SQL. When you run a cell in a notebook, the command is dispatched to the appropriate language REPL environment and run.

You can also use the REST 1.2 API to create an execution context and send a command to run in the execution context. Similarly, the command is dispatched to the language REPL environment and run.

A cluster has a maximum number of execution contexts (145). Once the number of execution contexts has reached this threshold, you cannot attach a notebook to the cluster or create a new execution context.

Idle execution contexts

An execution context is considered idle when the last completed execution occurred past a set idle threshold. Last completed execution is the last time the notebook completed execution of commands. The idle threshold is the amount of time that must pass between the last completed execution and any attempt to automatically detach the notebook. The default idle threshold is 24 hours.

When a cluster has reached the maximum context limit, Databricks removes (evicts) idle execution contexts (starting with the least recently used) as needed. Even when a context is removed, the notebook using the context is still attached to the cluster and appears in the cluster’s notebook list. Streaming notebooks are considered actively running, and their context is never evicted until their execution has been stopped. If an idle context is evicted, the UI displays a message indicating that the notebook using the context was detached due to being idle.

Notebook context evicted

If you attempt to attach a notebook to cluster that has maximum number of execution contexts and there are no idle contexts (or if auto-eviction is disabled), the UI displays a message saying that the current maximum execution contexts threshold has been reached and the notebook will remain in the detached state.

Notebook detached

If you fork a process, an idle execution context is still considered idle once execution of the request that forked the process returns. Forking separate processes is not recommended with Spark.

Configure context auto-eviction

Auto-eviction is enabled by default. To disable auto-eviction for a cluster, set the Spark property spark.databricks.chauffeur.enableIdleContextTracking false.

Attach a notebook to a cluster

To attach a notebook to a cluster, you need the Can Attach To cluster-level permission.

Important

As long as a notebook is attached to a cluster, any user with the Can Run permission on the notebook has implicit permission to access the cluster.

To attach a notebook to a cluster:

  1. In the notebook toolbar, click Notebook Cluster Icon Detached Cluster Dropdown.
  2. From the drop-down, select a cluster.

Important

An attached notebook has the following Apache Spark variables defined.

Class Variable Name
SparkContext sc
SQLContext/HiveContext sqlContext
SparkSession (Spark 2.x) spark

Do not create a SparkSession, SparkContext, or SQLContext. Doing so will lead to inconsistent behavior.

Determine Spark and Databricks Runtime version

To determine the Spark version of the cluster your notebook is attached to, run:

spark.version

To determine the Databricks Runtime version of the cluster your notebook is attached to, run:

Scala
dbutils.notebook.getContext.tags("sparkVersion")
Python
spark.conf.get("spark.databricks.clusterUsageTags.sparkVersion")

Note

Both this sparkVersion tag and the spark_version property required by the endpoints in the Clusters API 2.0 and Jobs API 2.1 refer to the Databricks Runtime version, not the Spark version.

Detach a notebook from a cluster

  1. In the notebook toolbar, click Notebook Cluster Icon Attached <cluster-name> Cluster Dropdown.

  2. Select Detach.

    Detach notebook

You can also detach notebooks from a cluster using the Notebooks tab on the cluster details page.

When you detach a notebook from a cluster, the execution context is removed and all computed variable values are cleared from the notebook.

Tip

Databricks recommends that you detach unused notebooks from a cluster. This frees up memory space on the driver.

View all notebooks attached to a cluster

The Notebooks tab on the cluster details page displays all of the notebooks that are attached to a cluster. The tab also displays the status of each attached notebook, along with the last time a command was run from the notebook.

Cluster details attached notebooks

Schedule a notebook

To schedule a notebook job to run periodically:

  1. In the notebook, click Notebook schedule button at the top right. If no jobs exist for this notebook, the Schedule dialog appears.

    Schedule notebook dialog

    If jobs already exist for the notebook, the Jobs List dialog appears. To display the Schedule dialog, click Add a schedule.

    Job list dialog
  2. In the Schedule dialog, optionally enter a name for the job. The default name is the name of the notebook.

  3. Select Manual to run your job only when manually triggered, or Scheduled to define a schedule for running the job. If you select Scheduled, use the drop-downs to specify the frequency, time, and time zone.

  4. In the Cluster drop-down, select the cluster to run the task.

    If you have Allow Cluster Creation permissions, by default the job runs on a new job cluster. To edit the configuration of the default job cluster, click Edit at the right of the field to display the cluster configuration dialog.

    If you do not have Allow Cluster Creation permissions, by default the job runs on the cluster that the notebook is attached to. If the notebook is not attached to a cluster, you must select a cluster from the Cluster drop-down.

  5. Optionally, enter any Parameters to pass to the job. Click Add and specify the key and value of each parameter. Parameters set the value of the notebook widget specified by the key of the parameter. Use Task parameter variables to pass a limited set of dynamic values as part of a parameter value.

  6. Optionally, specify email addresses to receive Email Alerts on job events. See Alerts.

  7. Click Submit.

Manage scheduled notebook jobs

To display jobs associated with this notebook, click the Schedule button. The jobs list dialog displays, showing all jobs currently defined for this notebook. To manage jobs, click Jobs Vertical Ellipsis at the right of a job in the list.

Job list menu

From this menu, you can edit, clone, view, pause, resume, or delete a scheduled job.

When you clone a scheduled job, a new job is created with the same parameters as the original. The new job appears in the list with the name “Clone of <initial job name>”.

How you edit a job depends on the complexity of the job’s schedule. Either the Schedule dialog or the Job details panel displays, allowing you to edit the schedule, cluster, parameters, and so on.

Distribute notebooks

To allow you to easily distribute Databricks notebooks, Databricks supports the Databricks archive, which is a package that can contain a folder of notebooks or a single notebook. A Databricks archive is a JAR file with extra metadata and has the extension .dbc. The notebooks contained in the archive are in a Databricks internal format.

Import an archive

  1. Click Down Caret or Menu Dropdown to the right of a folder or notebook and select Import.
  2. Choose File or URL.
  3. Go to or drop a Databricks archive in the dropzone.
  4. Click Import. The archive is imported into Databricks. If the archive contains a folder, Databricks recreates that folder.

Export an archive

Click Down Caret or Menu Dropdown to the right of a folder or notebook and select Export > DBC Archive. Databricks downloads a file named <[folder|notebook]-name>.dbc.