Skip to main content

Bundle configuration examples

This article provides example configuration for Databricks Asset Bundles features and common bundle use cases.

Complete bundle examples, outlined in the following table, are available in the bundle-examples GitHub repository:

Bundle name

Description

dashboard_nyc_taxi

A bundle with an AI/BI dashboard and a job that captures a snapshot of the dashboard and emails it to a subscriber

databricks_app

A bundle that defines a Databricks App

development_cluster

A bundle that defines and uses a development (all-purpose) cluster

job_read_secret

A bundle that defines a secret scope and a job with a task that reads from it

job_with_multiple_wheels

A bundle that defines and uses a job with multiple wheel dependencies

job_with_run_job_tasks

A bundle with multiple jobs with run job tasks

job_with_sql_notebook

A bundle with a job that uses a SQL notebook task

pipeline_with_schema

A bundle that defines a Unity Catalog schema and a pipeline that uses it

private_wheel_packages

A bundle that uses a private wheel package from a job

python_wheel_poetry

A bundle that builds a whl with Poetry

serverless_job

A bundle that uses serverless compute to run a job

share_files_across_bundles

A bundle that includes files located outside the bundle root directory.

spark_jar_task

A bundle that defines and uses a Spark JAR task

write_from_job_to_volume

A bundle that writes a file to a Unity Catalog volume

Bundle scenarios

This section contains configuration examples that demonstrate using top-level bundle mappings. See Configuration reference.

Bundle that uploads a JAR file to Unity Catalog

You can specify Unity Catalog volumes as an artifact path so that all artifacts, such as JAR files and wheel files, are uploaded to Unity Catalog volumes. The following example bundle builds and uploads a JAR file to Unity Catalog. For information on the artifact_path mapping, see artifact_path. For information on artifacts, see artifacts.

YAML
bundle:
name: jar-bundle

workspace:
host: https://myworkspace.cloud.databricks.com
artifact_path: /Volumes/main/default/my_volume

artifacts:
my_java_code:
path: ./sample-java
build: 'javac PrintArgs.java && jar cvfm PrintArgs.jar META-INF/MANIFEST.MF PrintArgs.class'
files:
- source: ./sample-java/PrintArgs.jar

resources:
jobs:
jar_job:
name: 'Spark Jar Job'
tasks:
- task_key: SparkJarTask
new_cluster:
num_workers: 1
spark_version: '14.3.x-scala2.12'
node_type_id: 'i3.xlarge'
spark_jar_task:
main_class_name: PrintArgs
libraries:
- jar: ./sample-java/PrintArgs.jar

Job configuration

This section contains job configuration examples. For job configuration details, see job.

Job that uses serverless compute

Databricks Asset Bundles support jobs that run on serverless compute. See Run your Lakeflow Jobs with serverless compute for workflows. To configure this, you can either omit the clusters setting for a job with a notebook task, or you can specify an environment as shown in the examples below. For Python script, Python wheel, and dbt tasks, environment_key is required for serverless compute. See environment_key.

YAML
# A serverless job (no cluster definition)
resources:
jobs:
serverless_job_no_cluster:
name: serverless_job_no_cluster

email_notifications:
on_failure:
- someone@example.com

tasks:
- task_key: notebook_task
notebook_task:
notebook_path: ../src/notebook.ipynb
YAML
# A serverless job (environment spec)
resources:
jobs:
serverless_job_environment:
name: serverless_job_environment

tasks:
- task_key: task
spark_python_task:
python_file: ../src/main.py

# The key that references an environment spec in a job.
# https://docs.databricks.com/api/workspace/jobs/create#tasks-environment_key
environment_key: default

# A list of task execution environment specifications that can be referenced by tasks of this job.
environments:
- environment_key: default

# Full documentation of this spec can be found at:
# https://docs.databricks.com/api/workspace/jobs/create#environments-spec
spec:
client: '1'
dependencies:
- my-library

Job with multiple wheel files

The following example configurations defines a bundle that contains a job with multiple *.whl files.

YAML
# job.yml
resources:
jobs:
example_job:
name: 'Example with multiple wheels'
tasks:
- task_key: task

spark_python_task:
python_file: ../src/call_wheel.py

libraries:
- whl: ../my_custom_wheel1/dist/*.whl
- whl: ../my_custom_wheel2/dist/*.whl

new_cluster:
node_type_id: i3.xlarge
num_workers: 0
spark_version: 14.3.x-scala2.12
spark_conf:
'spark.databricks.cluster.profile': 'singleNode'
'spark.master': 'local[*, 4]'
custom_tags:
'ResourceClass': 'SingleNode'
YAML
# databricks.yml
bundle:
name: job_with_multiple_wheels

include:
- ./resources/job.yml

workspace:
host: https://myworkspace.cloud.databricks.com

artifacts:
my_custom_wheel1:
type: whl
build: poetry build
path: ./my_custom_wheel1

my_custom_wheel2:
type: whl
build: poetry build
path: ./my_custom_wheel2

targets:
dev:
default: true
mode: development

Job that uses a requirements.txt file

The following example configuration defines a job that uses a requirements.txt file.

YAML
resources:
jobs:
job_with_requirements_txt:
name: 'Example job that uses a requirements.txt file'
tasks:
- task_key: task
job_cluster_key: default
spark_python_task:
python_file: ../src/main.py
libraries:
- requirements: /Workspace/${workspace.file_path}/requirements.txt

Job on a schedule

The following examples show configuration for jobs that run on a schedule. For information about job schedules and triggers, see Automating jobs with schedules and triggers.

This configuration defines a job that runs daily at a specified time:

YAML
resources:
jobs:
my-notebook-job:
name: my-notebook-job
tasks:
- task_key: my-notebook-task
notebook_task:
notebook_path: ./my-notebook.ipynb
schedule:
quartz_cron_expression: '0 0 8 * * ?' # daily at 8am
timezone_id: UTC
pause_status: UNPAUSED

In this configuration, the job runs one week after the job was last run:

YAML
resources:
jobs:
my-notebook-job:
name: my-notebook-job
tasks:
- task_key: my-notebook-task
notebook_task:
notebook_path: ./my-notebook.ipynb
trigger:
pause_status: UNPAUSED
periodic:
interval: 1
unit: WEEKS

Pipeline configuration

This section contains pipeline configuration examples. For pipeline configuration information, see pipeline.

Pipeline that uses serverless compute

Databricks Asset Bundles support pipelines that run on serverless compute. To configure this, set the pipeline serverless setting to true. The following example configuration defines a pipeline that runs on serverless compute and a job that triggers a refresh of the pipeline every hour.

YAML
# A pipeline that runs on serverless compute
resources:
pipelines:
my_pipeline:
name: my_pipeline
target: ${bundle.environment}
serverless: true
catalog: users
libraries:
- notebook:
path: ../src/my_pipeline.ipynb

configuration:
bundle.sourcePath: /Workspace/${workspace.file_path}/src
YAML
# This defines a job to refresh a pipeline that is triggered every hour
resources:
jobs:
my_job:
name: my_job

# Run this job once an hour.
trigger:
periodic:
interval: 1
unit: HOURS

email_notifications:
on_failure:
- someone@example.com

tasks:
- task_key: refresh_pipeline
pipeline_task:
pipeline_id: ${resources.pipelines.my_pipeline.id}