Configure and edit Databricks tasks
This article focuses on instructions for creating, configuring, and editing tasks using the Workflows workspace UI.
Databricks manages tasks as components of Databricks Jobs. A job has one or more tasks. You create a new job in the workspace UI by configuring the first task. To configure a new job, see Configure and edit Databricks Jobs.
Each task has an associated compute resource that runs the task logic. If you are using serverless, Databricks configures your compute resources. If you are not using serverless, see Configure compute for jobs.
Databricks has other entry points and tools for task configuration, including the following:
Create or configure a task
To edit an existing task or add a new task with the workspace UI, select an existing job using the following steps:
Click Workflows in the sidebar.
In the Name column, click the job name.
Click the Tasks tab. The task graph appears.
To edit a task, click the task name. The task configuration appears below the task graph.
To add a task, click .
Types of tasks
Configuration options and instructions vary by task. The following task types are available:
Clone a task
Clone tasks to copy all the configurations of an existing task, including upstream dependencies.
To clone a task, do the following:
Select the task in the task graph.
Click .
Specify a Cloned task name and click Clone.
Delete a task
To delete a task, do the following:
Select the task in the task graph.
Click and select Delete task.
Copy a task path
Certain task types, for example, notebook tasks, allow you to copy the path to the task source code:
Click the Tasks tab.
Select the task containing the path to copy.
Click next to the task path to copy the path to the clipboard.
Advanced task settings
The following advanced settings control retries for failed tasks and timeout policies for unresponsive tasks.
Note
You can set notifications at the task or job level. See Add notifications on a job.
Set a retry policy
The default setting for task retries depends on the job configuration. For most configurations, the default setting does not retry any tasks on task failure.
Serverless jobs auto-optimize retries by default. See Configure serverless compute auto-optimization to disallow retries
Continuous jobs use an exponential backoff retry policy. See How are failures handled for continuous jobs?.
To configure a policy that determines when and how many times failed task runs are retried, click + Add next to Retries.
The retry interval is calculated in milliseconds between the start of the failed run and the subsequent retry run.
Note
If you configure both Timeout and Retries, the timeout applies to each retry.
Configure thresholds for task run duration or streaming backlog metrics
Preview
Streaming observability for Databricks Jobs is in Public Preview.
You can configure optional thresholds for task run duration or streaming backlog metrics. To configure duration thresholds or streaming metric thresholds, click Metric thresholds in the task configuration panel.
To configure task duration thresholds, including expected and maximum completion times for the task, select Run duration in the Metric drop-down menu. Enter a duration in the Warning field to configure the tasks’s expected completion time. If the task run exceeds this threshold, an event is triggered. To configure a maximum completion time for a task, enter the maximum duration in the Timeout field. If the task does not complete in this time, Databricks sets its status to “Timed Out”.
To configure a threshold for a streaming backlog metric, select the metric in the Metric drop-down menu and enter a value for the threshold. To learn about the specific metrics supported by a streaming source, see View metrics for streaming tasks.
Enter a duration in the Warning field to configure the task’s expected completion time. If the task exceeds this threshold, an event is triggered. You can use this event to notify when a task is running slowly. See Configure notifications for slow jobs.
To configure a maximum completion time for a task, enter the maximum duration in the Timeout field. If the task does not complete in this time, Databricks sets its status to “Timed Out”.
If an event is triggered because a threshold is exceeded, you can use the event to send a notification. See Configure notifications for slow jobs.