Run tasks conditionally in a Databricks job
By default, a job task runs when its dependencies have run and have all succeeded, but you can also configure tasks in a Databricks job to run only when specific conditions are met. Databricks Jobs supports the following methods to run tasks conditionally:
You can specify Run if dependencies to run a task based on the run status of the task’s dependencies. For example, you can use
Run if
to run a task even when some or all of its dependencies have failed, allowing your job to recover from failures and continue running.The If/else condition task is used to run a part of a job DAG based on the results of a boolean expression. The
If/else condition
task allows you to add branching logic to your job. For example, run transformation tasks only if the upstream ingestion task adds new data. Otherwise, run data processing tasks.
Add the Run if
condition of a task
You can configure a Run if
condition when you edit a task with one or more dependencies. To add the condition to the task, select the condition from the Run if dependencies drop-down menu in the task configuration. The Run if
condition is evaluated after completing all task dependencies. You can also add a Run if
condition when you add a new task with one or more dependencies.
Run if
condition options
You can add the following Run if
conditions to a task:
All succeeded: All dependencies have run and succeeded. This is the default condition to run a task. The task is marked as
Upstream failed
if the condition is unmet.At least one succeeded: At least one dependency has succeeded. The task is marked as
Upstream failed
if the condition is unmet.None failed: None of the dependencies failed, and at least one dependency was run. The task is marked as
Upstream failed
if the condition is unmet.All done: The task is run after all its dependencies have run, regardless of the status of the dependent runs. This condition allows you to define a task that is run without depending on the outcome of its dependent tasks.
At least one failed: At least one dependency failed. The task is marked as
Excluded
if the condition is unmet.All failed: All dependencies have failed. The task is marked as
Excluded
if the condition is unmet.
Note
Tasks configured to handle failures are marked as
Excluded
if theirRun if
condition is unmet. Excluded tasks are skipped and are treated as successful.If all task dependencies are excluded, the task is also excluded, regardless of its
Run if
condition.If you cancel a task run, the cancellation propagates through downstream tasks, and tasks with a
Run if
condition that handles failure are run, for example, to verify a cleanup task runs when a task run is canceled.
How does Databricks Jobs determine job run status?
Databricks Jobs determines whether a job run was successful based on the outcome of the job’s leaf tasks. A leaf task is a task that has no downstream dependencies. A job run can have one of three outcomes:
Succeeded: All tasks were successful.
Succeeded with failures: Some tasks failed, but all leaf tasks were successful.
Failed: One or more leaf tasks failed.
Add branching logic to your job with the If/else condition
task
Use the If/else condition
task to run a part of a job DAG based on a boolean expression. The expression consists of a boolean operator and a pair of operands, where the operands might reference job or task state using job and task parameter variables or use task values.
Note
Numeric and non-numeric values are handled differently depending on the boolean operator:
The
==
and!=
operators perform string comparison of their operands. For example,12.0 == 12
evaluates to false.The
>
,>=
, and<=
operators perform numeric comparisons of their operands. For example,12.0 >= 12
evaluates to true, and10.0 >= 12
evaluates to false.Only numeric, string, and boolean values are allowed when referencing task values in an operand. Any other types will cause the condition expression to fail. Non-numeric value types are serialized to strings and are treated as strings in
If/else condition
expressions. For example, if a task value is set to a boolean value, it is serialized to"true"
or"false"
.
You can add an If/else condition
task when you create a job or edit a task in an existing job. To configure an If/else condition
task:
In the Type drop-down menu, select
If/else condition
.In the first Condition text box, enter the operand to be evaluated. The operand can reference a job or task parameter variable or a task value.
Select a boolean operator from the drop-down menu.
In the second Condition text box, enter the value for evaluating the condition.
To configure dependencies on an If/else condition
task:
Select the
If/else condition
task in the DAG view and click + Add task.After entering details for the task, click Depends on and select
<task-name> (true)
where<task-name>
is the name of theIf/else condition
task.Repeat for the condition evaluating to
false
.
For example, suppose you have a task named process_records
that maintains a count of records that are not valid in a value named bad_records
, and you want to branch processing based on whether records that are not valid are found. To add this logic to your workflow, you can create an If/else condition
task with an expression like {{tasks.process_records.values.bad_records}} > 0
. You can then add dependent tasks based on the results of the condition.
After the run of a job containing an If/else condition
task completes, you can view the result of the expression and details of the expression evaluation when you view the job run details in the UI.