Configure and edit Databricks tasks
This article focuses on instructions for creating, configuring, and editing tasks using the Workflows workspace UI.
Databricks manages tasks as components of Databricks Jobs. A job has one or more tasks. You create a new job in the workspace UI by configuring the first task. To configure a new job, see Configure and edit Databricks Jobs.
Each task has an associated compute resource that runs the task logic. If you are using serverless, Databricks configures your compute resources. If you are not using serverless, see Configure compute for jobs.
Databricks has other entry points and tools for task configuration, including the following:
Create or configure a task
To edit an existing task or add a new task with the workspace UI, select an existing job using the following steps:
Click Workflows in the sidebar.
In the Name column, click the job name.
Click the Tasks tab. The task graph appears.
To edit a task, click the task name. The task configuration appears below the task graph.
To add a task, click .
Types of tasks
Configuration options and instructions vary by task. The following task types are available:
Clone a task
Clone tasks to copy all the configurations of an existing task, including upstream dependencies.
To clone a task, do the following:
Select the task in the task graph.
Click .
Specify a Cloned task name and click Clone.
Delete a task
To delete a task, do the following:
Select the task in the task graph.
Click and select Delete task.
Copy a task path
Certain task types, for example, notebook tasks, allow you to copy the path to the task source code:
Click the Tasks tab.
Select the task containing the path to copy.
Click next to the task path to copy the path to the clipboard.
Advanced task settings
The following advanced settings control retries for failed tasks and timeout policies for unresponsive tasks.
Note
You can set notifications at the task or job level. See Add email and system notifications for job events.
Set a retry policy
The default setting for task retries depends on the job configuration. For most configurations, the default setting does not retry any tasks on task failure.
Serverless jobs auto-optimize retries by default. See Configure serverless compute auto-optimization to disallow retries
Continuous jobs use an exponential backoff retry policy. See How are failures handled for continuous jobs?.
To configure a policy that determines when and how many times failed task runs are retried, click + Add next to Retries.
The retry interval is calculated in milliseconds between the start of the failed run and the subsequent retry run.
Note
If you configure both Timeout and Retries, the timeout applies to each retry.
Configure an expected completion time or a timeout for a task
You can configure optional duration thresholds for a task, including an expected and maximum completion time. To configure duration thresholds, click Duration threshold.
Enter a duration in the Warning field to configure the task’s expected completion time. If the task exceeds this threshold, an event is triggered. You can use this event to notify when a task is running slowly. See Configure notifications for slow running or late jobs.
To configure a maximum completion time for a task, enter the maximum duration in the Timeout field. If the task does not complete in this time, Databricks sets its status to “Timed Out”.