Use version controlled notebooks in a Databricks job

You can run jobs using notebooks located in a remote Git repository or a Databricks repo. This feature simplifies the creation and management of production jobs and automates continuous deployment:

  • You don’t need to create a separate production repo in Databricks, manage its permissions, and keep it updated.

  • You can prevent unintentional changes to a production job, such as local edits in the production repo or changes from switching a branch.

  • The job definition process has a single source of truth in the remote repository, and each job run is linked to a commit hash.

To use notebooks in a remote Git repository, you must Set up Databricks Repos.

Use a remote Git repository

To create a task with a notebook located in a remote Git repository:

  1. Click Jobs Icon Workflows in the sidebar and click Create Job Button or go to an existing job and add a new task.

  2. If this is a new job, replace Add a name for your job… with your job name.

  3. Enter a name for the task in the Task name field.

  4. In the Type dropdown menu, select Notebook.

  5. In the Source dropdown menu, select Git provider and click Edit or Add a git reference. The Git information dialog appears.

  6. In the Git Information dialog, enter details for the repository, including the repository URL, the Git Provider, and the Git reference. This Git reference can be a branch, a tag, or a commit.

    For Path, enter a relative path to the notebook location, such as etl/notebooks/.

    When you enter the relative path, don’t begin it with / or ./, and don’t include the notebook file extension, such as .py. For example, if the absolute path for the notebook you want to access is notebook-best-practices/notebooks/, enter notebooks/covid_eda_raw in the Path field.

  7. Click Confirm

Additional notebook tasks in a multitask job can reference the same commit in the remote repository in one of the following ways:

  • sha of $branch/head when git_branch is set

  • sha of $tag when git_tag is set

  • the value of git_commit

When you view the run history of a task that runs a notebook stored in a remote Git repository, the Task run details panel includes Git details, including the commit SHA associated with the run.

Use a Databricks repo

If you prefer to use a Databricks repo for your notebooks, you can clone your repository into a Databricks repo:

  1. Click Repos Icon Repos in the sidebar and click Add Repo.

  2. Make sure Create repo by cloning a Git repository is selected and enter the details for your Git repository.

To add a notebook from a Databricks repo in a job task, in the Source dropdown menu, select Workspace and enter the path to the notebook in Path.

Access notebooks from an IDE

If you need to access notebooks from an integrated development environment, make sure you have the comment # Databricks Notebook source at the top of the notebook source code file. To distinguish between a regular Python file and a Databricks Python-language notebook exported in source-code format, Databricks adds the line # Databricks Notebook source at the top of the notebook source code file. When you import the notebook, Databricks recognizes it and imports it as a notebook, not as a Python module.


Error message:

Run result unavailable: job failed with error message Notebook not found: path-to-your-notebook

Possible causes:

Your notebook is missing the comment # Databricks Notebook source at the top of the notebook source code file.