Notebook task for jobs
Use the notebook task to deploy Databricks notebooks.
Configure a notebook task
Before you begin, you must have your notebook in a location accessible by the user configuring the job.
Note
The jobs UI displays options dynamically based on other configured settings.
To begin the flow to configure a Notebook
task:
Navigate to the Tasks tab in the Jobs UI.
In the Type drop-down menu, select
Notebook
.
Configure the source
In the Source drop-down menu, select a location for the Python script using one of the following options.
Workspace
Use Workspace to configure a notebook stored in the workspace by completing the following steps:
Click the Path field. The Select Notebook dialog appears.
Browse to the notebook, click to highlight the file, and click Confirm.
Note
You can use this option to configure a task for a notebook stored in a Databricks Git folder. Databricks recommends using the Git provider option and a remote Git repository for versioning assets scheduled with jobs.
Git provider
Use Git provider to configure a notebook in a remote Git repository.
The options displayed by the UI depend on whether or not you have already configured a Git provider elsewhere. Only one remote Git repository can be used for all tasks in a job. See Use Git with jobs.
Important
Notebooks created by Databricks jobs that run from remote Git repositories are ephemeral and cannot be relied upon to track MLflow runs, experiments, or models. When creating a notebook from a job, use a workspace MLflow experiment (instead of a notebook MLflow experiment) and call mlflow.set_experiment("/path/to/experiment")
in the workspace notebook before running any MLflow tracking code. For more details, see Prevent data loss in MLflow experiments.
The Path field appears after you have configured a git reference.
Enter the relative path for your notebook, such as etl/bronze/ingest.py
.
Important
When you enter the relative path, don’t begin with /
or ./
. For example, if the absolute path for the notebook you want to access is /etl/bronze/ingest.py
, enter etl/bronze/ingest.py
in the Path field.
Configure compute and dependent libraries
Use Compute to select or configure a cluster that supports the logic in your notebook.
If you use
Serverless
compute, use the Environment and Libraries field to select, edit, or add a new environment. See Install notebook dependencies.For all other compute configurations, click + Add under Dependent libraries. The Add dependent library dialogue appears.
You can select an existing library or upload a new library.
You can only use libraries stored in a location supported by your compute configurations. See Python library support.
Each Library Source has a different flow for selecting or uploading a library. See Libraries.
Finalize job configuration
(Optional) Configure Parameters as key-value pairs that can be accessed in the notebook using
dbutils.widgets
. See Configure task parameters.Click Save task.
Limitations
Total notebook cell output (the combined output of all notebook cells) is subject to a 20MB size limit. Additionally, individual cell output is subject to an 8MB size limit. If total cell output exceeds 20MB in size, or if the output of an individual cell is larger than 8MB, the run is canceled and marked as failed.
If you need help finding cells near or beyond the limit, run the notebook against an all-purpose cluster and use this notebook autosave technique.