Add tasks to jobs in Databricks Asset Bundles
This page provides information about how to define job tasks in Databricks Asset Bundles. For information about job tasks, see Configure and edit tasks in Lakeflow Jobs.
The job git_source
field and task source
field set to GIT
are not recommended for bundles, because local relative paths may not point to the same content in the Git repository. Bundles expect that a deployed job has the same files as the local copy from where it was deployed.
Instead, clone the repository locally and set up your bundle project within this repository, so that the source for tasks are the workspace.
Configure tasks
Define tasks for a job in a bundle in the tasks
key for the job definition. Examples of task configuration for the available task types is in the Task settings section. For information about defining a job in a bundle, see job.
To quickly generate resource configuration for an existing job using the Databricks CLI, you can use the bundle generate job
command. See bundle commands.
To set task values, most job task types have task-specific parameters, but you can also define job parameters that get passed to tasks. Dynamic value references are supported for job parameters, which enable passing values that are specific to the job run between tasks. For complete information on how to pass task values by task type, see Details by task type.
You can also override general job task settings with settings for a target workspace. See Override with target settings.
The following example configuration defines a job with two notebook tasks, and passes a task value from the first task to the second task.
resources:
jobs:
pass_task_values_job:
name: pass_task_values_job
tasks:
# Output task
- task_key: output_value
notebook_task:
notebook_path: ../src/output_notebook.ipynb
# Input task
- task_key: input_value
depends_on:
- task_key: output_value
notebook_task:
notebook_path: ../src/input_notebook.ipynb
base_parameters:
received_message: '{{tasks.output_value.values.message}}'
The output_notebook.ipynb
contains the following code, which sets a task value for the message
key:
# Databricks notebook source
# This first task sets a simple output value.
message = "Hello from the first task"
# Set the message to be used by other tasks
dbutils.jobs.taskValues.set(key="message", value=message)
print(f"Produced message: {message}")
The input_notebook.ipynb
retrieves the value of the parameter received_message
, that was set in the configuration for the task:
# This notebook receives the message as a parameter.
dbutils.widgets.text("received_message", "")
received_message = dbutils.widgets.get("received_message")
print(f"Received message: {received_message}")
Task settings
This section contains settings and examples for each job task type.
Clean room notebook task
The clean room notebook task runs a clean rooms notebook when the clean_rooms_notebook_task
field is present. For information about clean rooms, see What is Databricks Clean Rooms?.
The following keys are available for a clean rooms notebook task. For the corresponding REST API object definition, see clean_rooms_notebook_task.
Key | Type | Description |
---|---|---|
| String | Required. The clean room that the notebook belongs to. |
| String | Checksum to validate the freshness of the notebook resource. It can be fetched by calling the clean room assets get operation. |
| Map | Base parameters to be used for the clean room notebook job. |
| String | Required. Name of the notebook being run. |
Condition task
The condition_task
enables you to add a task with if/else conditional logic to your job. The task evaluates a condition that can be used to control the execution of other tasks. The condition task does not require a cluster to execute and does not support retries or notifications. For more information about the if/else condition task, see Add branching logic to a job with the If/else task.
The following keys are available for a condition task. For the corresponding REST API object definition, see condition_task.
Key | Type | Description |
---|---|---|
| String | Required. The left operand of the condition. Can be a string value or a job state or a dynamic value reference such as |
| String | Required. The operator to use for comparison. Valid values are: |
| String | Required. The right operand of the condition. Can be a string value or a job state or a dynamic value reference. |
Examples
The following example contains a condition task and a notebook task, where the notebook task only executes if the number of job repairs is less than 5.
resources:
jobs:
my-job:
name: my-job
tasks:
- task_key: condition_task
condition_task:
op: LESS_THAN
left: '{{job.repair_count}}'
right: '5'
- task_key: notebook_task
depends_on:
- task_key: condition_task
outcome: 'true'
notebook_task:
notebook_path: ../src/notebook.ipynb
Dashboard task
You use this task to refresh a dashboard and send a snapshot to subscribers. For more information about dashboards in bundles, see dashboard.
The following keys are available for a dashboard task. For the corresponding REST API object definition, see dashboard_task.
Key | Type | Description |
---|---|---|
| String | Required. The identifier of the dashboard to be refreshed. The dashboard must already exist. |
| Map | The subscription configuration for sending the dashboard snapshot. Each subscription object can specify destination settings for where to send snapshots after the dashboard refresh completes. See subscription. |
| String | The warehouse ID to execute the dashboard with for the schedule. If not specified, the default warehouse of the dashboard will be used. |
Examples
The following example adds a dashboard task to a job. When the job is run, the dashboard with the specified ID is refreshed.
resources:
jobs:
my-dashboard-job:
name: my-dashboard-job
tasks:
- task_key: my-dashboard-task
dashboard_task:
dashboard_id: 11111111-1111-1111-1111-111111111111
dbt task
You use this task to run one or more dbt commands. For more information about dbt, see Connect to dbt Cloud.
The following keys are available for a dbt task. For the corresponding REST API object definition, see dbt_task.
Key | Type | Description |
---|---|---|
| String | The name of the catalog to use. The catalog value can only be specified if a |
| Sequence | Required. A list of dbt commands to execute in sequence. Each command must be a complete dbt command (e.g., |
| String | The path to the directory containing the dbt profiles.yml file. Can only be specified if no |
| String | The path to the directory containing the dbt project. If not specified, defaults to the root of the repository or workspace directory. For projects stored in the Databricks workspace, the path must be absolute and begin with a slash. For projects in a remote repository, the path must be relative. |
| String | The schema to write to. This parameter is only used when a |
| String | The location type of the dbt project. Valid values are |
| String | The ID of the SQL warehouse to use for running dbt commands. If not specified, the default warehouse will be used. |
Examples
The following example adds a dbt task to a job. This dbt task uses the specified SQL warehouse to run the specified dbt commands.
To get a SQL warehouse's ID, open the SQL warehouse's settings page, then copy the ID found in parentheses after the name of the warehouse in the Name field on the Overview tab.
Databricks Asset Bundles also includes a dbt-sql
project template that defines a job with a dbt task, as well as dbt profiles for deployed dbt jobs. For information about Databricks Asset Bundles templates, see Default bundle templates.
resources:
jobs:
my-dbt-job:
name: my-dbt-job
tasks:
- task_key: my-dbt-task
dbt_task:
commands:
- 'dbt deps'
- 'dbt seed'
- 'dbt run'
project_directory: /Users/someone@example.com/Testing
warehouse_id: 1a111111a1111aa1
libraries:
- pypi:
package: 'dbt-databricks>=1.0.0,<2.0.0'
For each task
The for_each_task
enables you to add a task with a for each loop to your job. The task executes a nested task for every input provided. For more information about the for_each_task
, see Use a For each
task to run another task in a loop.
The following keys are available for a for_each_task
. For the corresponding REST API object definition, see for_each_task.
Key | Type | Description |
---|---|---|
| Integer | The maximum number of task iterations that can run concurrently. If not specified, all iterations may run in parallel subject to cluster and workspace limits. |
| String | Required. The input data for the loop. This can be a JSON string or a reference to an array parameter. Each element in the array will be passed to one iteration of the nested task. |
| Map | Required. The nested task definition to execute for each input. This object contains the complete task specification including |
Examples
The following example adds a for_each_task
to a job, where it loops over the values of another task and processes them.
resources:
jobs:
my_job:
name: my_job
tasks:
- task_key: generate_countries_list
notebook_task:
notebook_path: ../src/generate_countries_list.ipnyb
- task_key: process_countries
depends_on:
- task_key: generate_countries_list
for_each_task:
inputs: '{{tasks.generate_countries_list.values.countries}}'
task:
task_key: process_countries_iteration
notebook_task:
notebook_path: ../src/process_countries_notebook.ipnyb
JAR task
You use this task to run a JAR. You can reference local JAR libraries or those in a workspace, a Unity Catalog volume, or an external cloud storage location. See JAR file (Java or Scala).
For details on how to compile and deploy Scala JAR files on a Unity Catalog-enabled cluster in standard access mode, see Deploy Scala JARs on Unity Catalog clusters.
The following keys are available for a JAR task. For the corresponding REST API object definition, see jar_task.
Key | Type | Description |
---|---|---|
| String | Deprecated. The URI of the JAR to be executed. DBFS and cloud storage paths are supported. This field is deprecated and should not be used. Instead, use the |
| String | Required. The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code must use |
| Sequence | The parameters passed to the main method. Use task parameter variables to set parameters containing information about job runs. |
Examples
The following example adds a JAR task to a job. The path for the JAR is to a volume location.
resources:
jobs:
my-jar-job:
name: my-jar-job
tasks:
- task_key: my-jar-task
spark_jar_task:
main_class_name: org.example.com.Main
libraries:
- jar: /Volumes/main/default/my-volume/my-project-0.1.0-SNAPSHOT.jar
Notebook task
You use this task to run a notebook. See Notebook task for jobs.
The following keys are available for a notebook task. For the corresponding REST API object definition, see notebook_task.
Key | Type | Description |
---|---|---|
| Map | The base parameters to use for each run of this job.
|
| String | Required. The path of the notebook in the Databricks workspace or remote repository, for example |
| String | Location type of the notebook. Valid values are |
| String | The ID of the warehouse to run the notebook on. Classic SQL warehouses are not supported. Use serverless or pro SQL warehouses instead. Note that SQL warehouses only support SQL cells. If the notebook contains non-SQL cells, the run will fail, so if you need to use Python (or other) in a cell, use serverless. |
Examples
The following example adds a notebook task to a job and sets a job parameter named my_job_run_id
. The path for the notebook to deploy is relative to the configuration file in which this task is declared. The task gets the notebook from its deployed location in the Databricks workspace.
resources:
jobs:
my-notebook-job:
name: my-notebook-job
tasks:
- task_key: my-notebook-task
notebook_task:
notebook_path: ./my-notebook.ipynb
parameters:
- name: my_job_run_id
default: '{{job.run_id}}'
Pipeline task
You use this task to run a pipeline. See Lakeflow Declarative Pipelines.
The following keys are available for a pipeline task. For the corresponding REST API object definition, see pipeline_task.
Key | Type | Description |
---|---|---|
| Boolean | If true, a full refresh of the pipeline will be triggered, which will completely recompute all datasets in the pipeline. If false or omitted, only incremental data will be processed. For details, see Pipeline refresh semantics. |
| String | Required. The ID of the pipeline to run. The pipeline must already exist. |
Examples
The following example adds a pipeline task to a job. This task runs the specified pipeline.
You can get a pipelines's ID by opening the pipeline in the workspace and copying the Pipeline ID value on the Pipeline details tab of the pipeline's settings page.
resources:
jobs:
my-pipeline-job:
name: my-pipeline-job
tasks:
- task_key: my-pipeline-task
pipeline_task:
pipeline_id: 11111111-1111-1111-1111-111111111111
Power BI task
Power BI task type is in Public Preview.
Use this task to trigger a refresh of a Power BI semantic model (formerly known as a dataset).
The following keys are available for a Power BI task. For the corresponding REST API object definition, see power_bi_task.
Key | Type | Description |
---|---|---|
| String | Required. The name of the Unity Catalog connection to authenticate from Databricks to Power BI. |
| String | Required. The name of the Power BI semantic model (dataset) to update. |
| Boolean | Whether to refresh the Power BI semantic model after the update completes. Defaults to false. |
| Sequence | A list of tables (each as a Map) to be exported to Power BI. See tables. |
| String | The ID of the SQL warehouse to use as the Power BI datasource. |
Examples
The following example defines a Power BI task, which specifies a connection, the Power BI model to update, and the Databricks table to export.
resources:
jobs:
my_job:
name: my_job
tasks:
- task_key: power_bi_task
power_bi_task:
connection_resource_name: 'connection_name'
power_bi_model:
workspace_name: 'workspace_name'
model_name: 'model_name'
storage_mode: 'DIRECT_QUERY'
authentication_method: 'OAUTH'
overwrite_existing: false
refresh_after_update: false
tables:
- catalog: 'main'
schema: 'tpch'
name: 'customers'
storage_mode: 'DIRECT_QUERY'
warehouse_id: '1a111111a1111aa1'
Python script task
You use this task to run a Python file.
The following keys are available for a Python script task. For the corresponding REST API object definition, see python_task.
Key | Type | Description |
---|---|---|
| Sequence | The parameters to pass to the Python file. Use task parameter variables to set parameters containing information about job runs. |
| String | Required. The URI of the Python file to be executed, for example |
| String | The location type of the Python file. Valid values are |
Examples
The following example adds a Python script task to a job. The path for the Python file to deploy is relative to the configuration file in which this task is declared. The task gets the Python file from its deployed location in the Databricks workspace.
resources:
jobs:
my-python-script-job:
name: my-python-script-job
tasks:
- task_key: my-python-script-task
spark_python_task:
python_file: ./my-script.py
Python wheel task
You use this task to run a Python wheel. See Build a Python wheel file using Databricks Asset Bundles.
The following keys are available for a Python wheel task. For the corresponding REST API object definition, see python_wheel_task.
Key | Type | Description |
---|---|---|
| String | Required. The named entry point to execute: function or class. If it does not exist in the metadata of the package it executes the function from the package directly using |
| Map | The named parameters to pass to the Python wheel task, also know as Keyword arguments. A named parameter is a key-value pair with a string key and a string value. Both |
| String | Required. The name of the Python package to execute. All dependencies must be installed in the environment. This does not check for or install any package dependencies. |
| Sequence | The parameters to pass to the Python wheel task, also known as Positional arguments. Each parameter is a string. If specified, |
Examples
The following example adds a Python wheel task to a job. The path for the Python wheel file to deploy is relative to the configuration file in which this task is declared. See Databricks Asset Bundles library dependencies.
resources:
jobs:
my-python-wheel-job:
name: my-python-wheel-job
tasks:
- task_key: my-python-wheel-task
python_wheel_task:
entry_point: run
package_name: my_package
libraries:
- whl: ./my_package/dist/my_package-*.whl
Run job task
You use this task to run another job.
The following keys are available for a run job task. For the corresponding REST API object definition, see run_job_task.
Key | Type | Description |
---|---|---|
| Integer | Required. The ID of the job to run. The job must already exist in the workspace. |
| Map | Job-level parameters to pass to the job being run. These parameters are accessible within the job's tasks. |
| Map | Parameters for the pipeline task. Used only if the job being run contains a pipeline task. Can include |
Examples
The following example contains a run job task in the second job that runs the first job.
This example uses a substitution to retrieve the ID of the job to run. To get a job's ID from the UI, open the job in the workspace and copy the ID from the Job ID value in the Job details tab of the jobs's settings page.
resources:
jobs:
my-first-job:
name: my-first-job
tasks:
- task_key: my-first-job-task
new_cluster:
spark_version: '13.3.x-scala2.12'
node_type_id: 'i3.xlarge'
num_workers: 2
notebook_task:
notebook_path: ./src/test.py
my_second_job:
name: my-second-job
tasks:
- task_key: my-second-job-task
run_job_task:
job_id: ${resources.jobs.my-first-job.id}
SQL task
You use this task to run a SQL file, query, or alert.
The following keys are available for a SQL task. For the corresponding REST API object definition, see sql_task.
Key | Type | Description |
---|---|---|
| Map | Configuration for running a SQL alert. Contains:
|
| Map | Configuration for refreshing a SQL dashboard. Contains:
|
| Map | Configuration for running a SQL file. Contains:
|
| Map | Parameters to be used for each run of this task. SQL queries and files can use these parameters by referencing them with the syntax |
| Map | Configuration for running a SQL query. Contains:
|
| String | Required. The ID of the SQL warehouse to use to run the SQL task. The SQL warehouse must already exist. |
Examples
To get a SQL warehouse's ID, open the SQL warehouse's settings page, then copy the ID found in parentheses after the name of the warehouse in the Name field on the Overview tab.
The following example adds a SQL file task to a job. This SQL file task uses the specified SQL warehouse to run the specified SQL file.
resources:
jobs:
my-sql-file-job:
name: my-sql-file-job
tasks:
- task_key: my-sql-file-task
sql_task:
file:
path: /Users/someone@example.com/hello-world.sql
source: WORKSPACE
warehouse_id: 1a111111a1111aa1
The following example adds a SQL alert task to a job. This SQL alert task uses the specified SQL warehouse to refresh the specified SQL alert.
resources:
jobs:
my-sql-file-job:
name: my-sql-alert-job
tasks:
- task_key: my-sql-alert-task
sql_task:
warehouse_id: 1a111111a1111aa1
alert:
alert_id: 11111111-1111-1111-1111-111111111111
The following example adds a SQL query task to a job. This SQL query task uses the specified SQL warehouse to run the specified SQL query.
resources:
jobs:
my-sql-query-job:
name: my-sql-query-job
tasks:
- task_key: my-sql-query-task
sql_task:
warehouse_id: 1a111111a1111aa1
query:
query_id: 11111111-1111-1111-1111-111111111111
Other task settings
The following task settings allow you to configure behaviors for all tasks. For the corresponding REST API object definitions, see tasks.
Key | Type | Description |
---|---|---|
| String | The key of the compute resource to use for this task. If specified, |
| Sequence | An optional list of task dependencies. Each item contains:
|
| String | An optional description for the task. |
| Boolean | Whether to disable automatic optimization for this task. If true, automatic optimizations like adaptive query execution will be disabled. |
| Map | An optional set of email addresses to notify when a run begins, completes, or fails. Each item contains:
|
| String | The key of an environment defined in the job's |
| String | The ID of an existing cluster that will be used for all runs of this task. |
| Map | An optional specification for health monitoring of this task that includes a |
| String | The key of a job cluster defined in the job's |
| Sequence | An optional list of libraries to be installed on the cluster that will execute the task. Each library is specified as a map with keys like |
| Integer | An optional maximum number of times to retry the task if it fails. If not specified, the task will not be retried. |
| Integer | An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. If not specified, the default is 0 (immediate retry). |
| Map | A specification for a new cluster to be created for each run of this task. See cluster. |
| Map | Optional notification settings for this task. Each item contains:
|
| Boolean | An optional policy to specify whether to retry the task when it times out. If not specified, defaults to false. |
| String | An optional value indicating the condition under which the task should run. Valid values are:
|
| String | Required. A unique name for the task. This field is used to refer to this task from other tasks using the |
| Integer | An optional timeout applied to each run of this task. A value of 0 means no timeout. If not set, the default timeout from the cluster configuration is used. |
| Map | An optional set of system destinations to notify when a run begins, completes, or fails. Each item contains:
|