Run a parameterized Databricks job task in a loop
This article discusses using the For each
task with your Databricks jobs, including details on adding and configuring the task in the Jobs UI. Use the For each
task to run a task in a loop, passing a different set of parameters to each iteration of the task.
Adding the For each
task to a job requires defining two tasks: The For each
task and a nested task. The nested task is the task to run for each iteration of the For each
task and is one of the standard Databricks Jobs task types. You cannot add another For each
task as the nested task.
For example, you could use the For each
task to perform a common set of transformations on multiple tables, passing a table name from a list of table names to each iteration of the task.
What parameter types can I use with the For each
task?
To pass parameters from a For each
task, you can:
Define a JSON-formatted collection when you create or edit a task.
Use task values passed from a preceding task. To learn more about task values, see Use task values to pass information between tasks.
Use job parameters. To learn more about job parameters, see Configure job parameters.
To learn how to use these different parameter types when you add or edit a For each
task, see the next section Add the For each task to a job.
Add the For each
task to a job
You can add a For each
task when you create a job or edit a task in an existing job. To configure a For each
task:
In the Type drop-down menu, select For each.
Enter a name for the task in the Task name field.
In the Inputs text box, define the values for the
For each
task to iterate on. This can be one of the following:A JSON formatted array of values. This can be an array of the following data types:
key-value pairs
Strings, numbers, or Boolean types
Arbitrarily complex JSON objects
Task value references. To reference task values passed from a preceding task, use the
{{tasks.<task_name>.values.<task_value_name>}}
syntax to set the value in the Inputs text box. For example, if a task namedgenerate_countries_list
that precedes theFor each
task sets the following task value:dbutils.jobs.taskValues.set(key = "countries", value = countries_array)
Then the
For each
task references the task value in the Inputs text box using the following syntax:{{tasks.generate_countries_list.values.countries}}
.Job parameters. To reference a job parameter, use the following syntax in the Inputs text box:
{{job.parameters.<name>}}
. For example,{{job.parameters.countries}}
.
To optionally set the number of iterations that can run in parallel, enter a Concurrency value for the task. The default value is 1.
To optionally receive notifications for task start, success, or failure, click + Add. See Add notifications on a job.
To complete the configuration of the
For each
task and add a nested task to run for each iteration, click Add a task to loop over.Select a task type and configuration options for the nested task. Nested tasks are standard task types and have the same configuration options. See Configure and edit Databricks tasks.
To reference parameters passed from the
For each
task, click Parameters. Use the{{input}}
reference to set the value to the array value of each iteration or{{input.<key>}}
to reference individual object fields when you iterate over a list of objects.Click Create task.
Switch between the For each
task and the nested task
The For each
task appears in the Jobs UI as a node with the nested task node inside the For each
node. To switch between the For each
task and the nested task, click the respective nodes.
Reference a For each
task in downstream tasks
The For each
task is the top-level task, and downstream tasks can specify it as a dependency. Downstream tasks cannot depend on or reference the nested task.
Run and monitor a job with a For each
task
Running a job with a For each
task is identical to running any other job.
Viewing and managing job runs is also identical to any other job, except the task run history for a For each
task, which is presented as a table of task iterations. See View task run history for a For each task.