Configure and edit tasks in Lakeflow Jobs

This article focuses on instructions for creating, configuring, and editing tasks using the Jobs & Pipelines workspace UI.

Databricks manages tasks as components of Lakeflow Jobs. A job has one or more tasks. You create a new job in the workspace UI by configuring the first task. To configure a new job, see Configure and edit Lakeflow Jobs.

Each task has an associated compute resource that runs the task logic. If you are using serverless, Databricks configures your compute resources. If you are not using serverless, see Configure compute for jobs.

Databricks has other entry points and tools for task configuration, including the following:

Create or configure a task

To edit an existing task or add a new task with the workspace UI, select an existing job using the following steps:

In your Databricks workspace's sidebar, click Jobs & Pipelines.
Optionally, select the Jobs and Owned by me filters.
Click your job's Name link.
Click the Tasks tab. The task graph appears.
To edit a task, click the task name. The task configuration appears below the task graph.
To add a task, click .

Types of tasks

Configuration options and instructions vary by task. The following task types are available:

Clone a task

Clone tasks to copy all the configurations of an existing task, including upstream dependencies.

To clone a task, do the following:

Select the task in the task graph.
Click .
Specify a Cloned task name and click Clone.

Delete a task

To delete a task, do the following:

Select the task in the task graph.
Click and select Delete task.

Copy a task path

Certain task types, for example, notebook tasks, allow you to copy the path to the task source code:

Click the Tasks tab.
Select the task containing the path to copy.
Click next to the task path to copy the path to the clipboard.

Advanced task settings

The following advanced settings control retries for failed tasks and timeout policies for unresponsive tasks.

note

You can set notifications at the task or job level. See Add notifications on a job.

Set a retry policy

The default setting for task retries depends on the job configuration. For most configurations, the default setting does not retry any tasks on task failure.

Serverless jobs auto-optimize retries by default. See Configure serverless compute auto-optimization to disallow retries

Continuous jobs use an exponential backoff retry policy. See How are failures handled for continuous jobs?.

To configure a policy that determines when and how many times failed task runs are retried, click + Add next to Retries.

The retry interval is calculated in milliseconds between the start of the failed run and the subsequent retry run.

note

If you configure both Timeout and Retries, the timeout applies to each retry.

Configure thresholds for task run duration or streaming backlog metrics

Preview

Streaming observability for Lakeflow Jobs is in Public Preview.

You can configure optional thresholds for task run duration or streaming backlog metrics. To configure duration thresholds or streaming metric thresholds, click Metric thresholds in the task configuration panel.

To configure task duration thresholds, including expected and maximum completion times for the task, select Run duration in the Metric drop-down menu. Enter a duration in the Warning field to configure the tasks's expected completion time. If the task run exceeds this threshold, an event is triggered. To configure a maximum completion time for a task, enter the maximum duration in the Timeout field. If the task does not complete in this time, Databricks sets its status to “Timed Out”.

To configure a threshold for a streaming backlog metric, select the metric in the Metric drop-down menu and enter a value for the threshold. To learn about the specific metrics supported by a streaming source, see View metrics for streaming tasks.

Enter a duration in the Warning field to configure the task's expected completion time. If the task exceeds this threshold, an event is triggered. You can use this event to notify when a task is running slowly. See Configure notifications for slow jobs.

To configure a maximum completion time for a task, enter the maximum duration in the Timeout field. If the task does not complete in this time, Databricks sets its status to “Timed Out”.

If an event is triggered because a threshold is exceeded, you can use the event to send a notification. See Configure notifications for slow jobs.

Create or configure a task​

Types of tasks​

Clone a task​

Delete a task​

Copy a task path​

Advanced task settings​

Set a retry policy​

Configure thresholds for task run duration or streaming backlog metrics​