Skip to main content

Databricks Asset Bundles resources

Databricks Asset Bundles allows you to specify information about the Databricks resources used by the bundle in the resources mapping in the bundle configuration. See resources mapping and resources key reference.

This page provides configuration reference for all supported resource types for bundles and provides details and an example for each supported type. For additional examples, see Bundle configuration examples.

The JSON schema for bundles that is used to validate YAML configuration is in the Databricks CLI GitHub repository.

tip

To generate YAML for any existing resource, use the databricks bundle generate command. See databricks bundle generate.

Supported resources

The following table lists supported resource types for bundles (YAML and Python, where applicable). Some resources can be created by defining them in a bundle and deploying the bundle, and some resources can only be created by referencing an existing asset to include in the bundle.

Resource configuration defines a Databricks object that corresponds to a Databricks REST API object. The REST API object's supported create request fields, expressed as YAML, are the resource's supported keys. Links to documentation for each resource's corresponding object are in the table below.

tip

The databricks bundle validate command returns warnings if unknown resource properties are found in bundle configuration files.

app

Type: Map

The app resource defines a Databricks app. For information about Databricks Apps, see Databricks Apps.

To add an app, specify the settings to define the app, including the required source_code_path.

tip

You can initialize a bundle with a Streamlit Databricks app using the following command:

databricks bundle init https://github.com/databricks/bundle-examples --template-dir contrib/templates/streamlit-app
YAML
apps:
<app-name>:
<app-field-name>: <app-field-value>

Key

Type

Description

budget_policy_id

String

The budget policy ID for the app.

compute_size

String

The compute size for the app. Valid values are MEDIUM, LARGE, LIQUID, but depend on workspace configuration.

config

Map

Deprecated. Define your app configuration commands and environment variables in the app.yaml file instead. See Configure a Databricks app.

description

String

The description of the app.

lifecycle

Map

The behavior of the resource when it is deployed or destroyed. See lifecycle.

name

String

The name of the app. The name must contain only lowercase alphanumeric characters and hyphens. It must be unique within the workspace.

permissions

Sequence

The app's permissions. See permissions.

resources

Sequence

The app compute resources. See app.resources.

source_code_path

String

The ./app local path of the Databricks app source code. This field is required.

user_api_scopes

Sequence

The user API scopes.

app.resources

Type: Sequence

The compute resources for the app.

Key

Type

Description

description

String

The description of the app resource.

database

Map

The settings that identify the Lakebase database to use. See app.resources.database.

genie_space

Map

The settings that identify the Genie space to use. See app.resources.genie_space.

job

Map

The settings that identify the job resource to use. See app.resources.job.

name

String

The name of the app resource.

secret

Map

The settings that identify the Databricks secret resource to use. See app.resources.secret.

serving_endpoint

Map

The settings that identify the model serving endpoint resource to use. See app.resources.serving_endpoint.

sql_warehouse

Map

The settings that identify the SQL warehouse resource to use. See app.resources.sql_warehouse.

us_securable

Map

The settings that identify the Unity Catalog volume to use. See app.resources.uc_securable.

app.resources.database

Type: Map

The settings that identify the Lakebase database to use.

Key

Type

Description

id

String

The ID of the database instance.

permission

String

The permission level for the database. Valid values include CAN_USE, CAN_MANAGE.

app.resources.genie_space

Type: Map

The settings that identify the Genie space to use.

Key

Type

Description

name

String

The name of the Genie space.

permission

String

The permission level for the space. Valid values include CAN_VIEW, CAN_EDIT, CAN_MANAGE, CAN_RUN.

space_id

String

The ID of the Genie space, for example 550e8400-e29b-41d4-a716-999955440000.

app.resources.job

Type: Map

The settings that identify the job resource to use.

Key

Type

Description

id

String

The ID of the job.

permission

String

The permission level for the job. Valid values include CAN_VIEW, CAN_MANAGE_RUN, CAN_MANAGE.

app.resources.secret

Type: Map

The settings that identify the Databricks secret resource to use.

Key

Type

Description

scope

String

The name of the secret scope.

key

String

The key within the secret scope.

permission

String

The permission level for the secret. Valid values include READ, WRITE, MANAGE.

app.resources.serving_endpoint

Type: Map

The settings that identify the model serving endpoint resource to use.

Key

Type

Description

name

String

The name of the serving endpoint.

permission

String

The permission level for the serving endpoint. Valid values include CAN_QUERY, CAN_MANAGE.

app.resources.sql_warehouse

Type: Map

The settings that identify the SQL warehouse to use.

Key

Type

Description

id

String

The ID of the SQL warehouse.

permission

String

The permission level for the SQL warehouse. Valid values include CAN_USE, CAN_MANAGE.

app.resources.uc_securable

Type: Map

The settings that identify the Unity Catalog volume to use.

Key

Type

Description

full_name

String

The full name of the Unity Catalog securable in the format catalog.schema.name.

permission

String

The permission level for the UC securable. Valid values include READ_FILES, WRITE_FILES, ALL_PRIVILEGES.

Example

The following example creates an app named my_app that manages a job created by the bundle:

YAML
resources:
jobs:
# Define a job in the bundle
hello_world:
name: hello_world
tasks:
- task_key: task
spark_python_task:
python_file: ../src/main.py
environment_key: default

environments:
- environment_key: default
spec:
environment_version: '2'

# Define an app that manages the job in the bundle
apps:
job_manager:
name: 'job_manager_app'
description: 'An app which manages a job created by this bundle'

# The location of the source code for the app
source_code_path: ../src/app

# The resources in the bundle which this app has access to. This binds the resource in the app with the bundle resource.
resources:
- name: 'app-job'
job:
id: ${resources.jobs.hello_world.id}
permission: 'CAN_MANAGE_RUN'

The corresponding app.yaml defines the configuration for running the app:

YAML
command:
- flask
- --app
- app
- run
- --debug
env:
- name: JOB_ID
valueFrom: 'app-job'

For the complete Databricks app example bundle, see the bundle-examples GitHub repository.

cluster

Type: Map

The cluster resource defines a cluster.

YAML
clusters:
<cluster-name>:
<cluster-field-name>: <cluster-field-value>

Key

Type

Description

apply_policy_default_values

Boolean

When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied.

autoscale

Map

Parameters needed in order to automatically scale clusters up and down based on load. See autoscale.

autotermination_minutes

Integer

Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.

aws_attributes

Map

Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used. See aws_attributes.

azure_attributes

Map

Attributes related to clusters running on Microsoft Azure. If not specified at cluster creation, a set of default values will be used. See azure_attributes.

cluster_log_conf

Map

The configuration for delivering spark logs to a long-term storage destination. See cluster_log_conf.

cluster_name

String

Cluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string.

custom_tags

Map

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags.

data_security_mode

String

The data governance model to use when accessing data from a cluster. Valid values include NONE, SINGLE_USER, USER_ISOLATION, LEGACY_SINGLE_USER, LEGACY_TABLE_ACL, LEGACY_PASSTHROUGH.

docker_image

Map

The custom docker image. See docker_image.

driver_instance_pool_id

String

The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.

driver_node_type_id

String

The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above. This field, along with node_type_id, should not be set if virtual_cluster_size is set. If both driver_node_type_id, node_type_id, and virtual_cluster_size are specified, driver_node_type_id and node_type_id take precedence.

enable_elastic_disk

Boolean

Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.

enable_local_disk_encryption

Boolean

Whether to enable LUKS on cluster VMs' local disks

gcp_attributes

Map

Attributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used. See gcp_attributes.

init_scripts

Sequence

The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. See init_scripts.

instance_pool_id

String

The optional ID of the instance pool to which the cluster belongs.

is_single_node

Boolean

This field can only be used when kind = CLASSIC_PREVIEW. When set to true, Databricks will automatically set single node related custom_tags, spark_conf, and num_workers

kind

String

The kind of compute described by this compute specification.

node_type_id

String

This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.

num_workers

Integer

Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes.

permissions

Sequence

The cluster permissions. See permissions.

policy_id

String

The ID of the cluster policy used to create the cluster if applicable.

runtime_engine

String

Determines the cluster's runtime engine, either STANDARD or PHOTON.

single_user_name

String

Single user name if data_security_mode is SINGLE_USER

spark_conf

Map

An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

spark_env_vars

Map

An object containing a set of optional, user-specified environment variable key-value pairs.

spark_version

String

The Spark version of the cluster, e.g. 3.3.x-scala2.11. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.

ssh_public_keys

Sequence

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

use_ml_runtime

Boolean

This field can only be used when kind = CLASSIC_PREVIEW. effective_spark_version is determined by spark_version (DBR release), this field use_ml_runtime, and whether node_type_id is gpu node or not.

workload_type

Map

Cluster Attributes showing for clusters workload types. See workload_type.

cluster.autoscale

Type: Map

Parameters for automatically scaling clusters up and down based on load.

Key

Type

Description

min_workers

Integer

The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation.

max_workers

Integer

The maximum number of workers to which the cluster can scale up when overloaded. max_workers must be strictly greater than min_workers.

cluster.aws_attributes

Type: Map

Attributes related to clusters running on Amazon Web Services.

Key

Type

Description

zone_id

String

Identifier for the availability zone/datacenter in which the cluster resides. This string will be of a form like us-west-2a.

availability

String

Availability type used for all subsequent nodes past the first_on_demand ones. Valid values are SPOT, ON_DEMAND, SPOT_WITH_FALLBACK.

spot_bid_price_percent

Integer

The max price for AWS spot instances, as a percentage of the corresponding instance type's on-demand price.

instance_profile_arn

String

Nodes for this cluster will only be placed on AWS instances with this instance profile.

first_on_demand

Integer

The first first_on_demand nodes of the cluster will be placed on on-demand instances. This value should be greater than 0, to make sure the cluster driver node is placed on an on-demand instance.

ebs_volume_type

String

The type of EBS volumes that will be launched with this cluster. Valid values are GENERAL_PURPOSE_SSD or THROUGHPUT_OPTIMIZED_HDD.

ebs_volume_count

Integer

The number of volumes launched for each instance.

ebs_volume_size

Integer

The size of each EBS volume (in GiB) launched for each instance.

ebs_volume_iops

Integer

The number of IOPS per EBS gp3 volume.

ebs_volume_throughput

Integer

The throughput per EBS gp3 volume, in MiB per second.

cluster.azure_attributes

Type: Map

Attributes related to clusters running on Microsoft Azure.

Key

Type

Description

first_on_demand

Integer

The first first_on_demand nodes of the cluster will be placed on on-demand instances.

availability

String

Availability type used for all subsequent nodes past the first_on_demand ones. Valid values are SPOT_AZURE, ON_DEMAND_AZURE, SPOT_WITH_FALLBACK_AZURE.

spot_bid_max_price

Number

The max price for Azure spot instances. Use -1 to specify lowest price.

cluster.gcp_attributes

Type: Map

Attributes related to clusters running on Google Cloud Platform.

Key

Type

Description

use_preemptible_executors

Boolean

Whether to use preemptible executors. Preemptible executors are preemptible GCE instances that may be reclaimed by GCE at any time.

google_service_account

String

The Google service account to be used by the Databricks cluster VM instances.

local_ssd_count

Integer

The number of local SSDs to attach to each node in the cluster. The default value is 0.

zone_id

String

Identifier for the availability zone/datacenter in which the cluster resides.

availability

String

Availability type used for all nodes. Valid values are PREEMPTIBLE_GCP, ON_DEMAND_GCP, PREEMPTIBLE_WITH_FALLBACK_GCP.

boot_disk_size

Integer

The size of the boot disk in GB. Values typically range from 100 to 1000.

cluster.cluster_log_conf

The configuration for delivering Spark logs to a long-term storage destination.

Key

Type

Description

dbfs

Map

DBFS location for cluster log delivery. See dbfs.

s3

Map

S3 location for cluster log delivery. See s3.

volumes

Map

Volumes location for cluster log delivery. See volumes.

cluster.cluster_log_conf.dbfs

Type: Map

DBFS location for cluster log delivery.

Key

Type

Description

destination

String

The DBFS path for cluster log delivery (for example, dbfs:/cluster-logs).

cluster.cluster_log_conf.s3

Type: Map

S3 location for cluster log delivery.

Key

Type

Description

destination

String

The S3 URI for cluster log delivery (for example, s3://my-bucket/cluster-logs).

region

String

The AWS region of the S3 bucket.

endpoint

String

The S3 endpoint URL (optional).

enable_encryption

Boolean

Whether to enable encryption for cluster logs.

encryption_type

String

The encryption type. Valid values include SSE_S3, SSE_KMS.

kms_key

String

The KMS key ARN for encryption (when using SSE_KMS).

canned_acl

String

The canned ACL to apply to cluster logs.

cluster.cluster_log_conf.volumes

Type: Map

Volumes location for cluster log delivery.

Key

Type

Description

destination

String

The volume path for cluster log delivery (for example, /Volumes/catalog/schema/volume/cluster_log).

cluster.docker_image

Type: Map

The custom Docker image configuration.

Key

Type

Description

url

String

URL of the Docker image.

basic_auth

Map

Basic authentication for Docker repository. See basic_auth.

cluster.docker_image.basic_auth

Type: Map

Basic authentication for Docker repository.

Key

Type

Description

username

String

The username for Docker registry authentication.

password

String

The password for Docker registry authentication.

cluster.init_scripts

Type: Map

The configuration for storing init scripts. At least one location type must be specified.

Key

Type

Description

dbfs

Map

DBFS location of init script. See dbfs.

workspace

Map

Workspace location of init script. See workspace.

s3

Map

S3 location of init script. See s3.

abfss

Map

ABFSS location of init script. See abfss.

gcs

Map

GCS location of init script. See gcs.

volumes

Map

UC Volumes location of init script. See volumes.

cluster.init_scripts.dbfs

Type: Map

DBFS location of init script.

Key

Type

Description

destination

String

The DBFS path of the init script.

cluster.init_scripts.workspace

Type: Map

Workspace location of init script.

Key

Type

Description

destination

String

The workspace path of the init script.

cluster.init_scripts.s3

Type: Map

S3 location of init script.

Key

Type

Description

destination

String

The S3 URI of the init script.

region

String

The AWS region of the S3 bucket.

endpoint

String

The S3 endpoint URL (optional).

cluster.init_scripts.abfss

Type: Map

ABFSS location of init script.

Key

Type

Description

destination

String

The ABFSS path of the init script.

cluster.init_scripts.gcs

Type: Map

GCS location of init script.

Key

Type

Description

destination

String

The GCS path of the init script.

cluster.init_scripts.volumes

Type: Map

Volumes location of init script.

Key

Type

Description

destination

String

The UC Volumes path of the init script.

cluster.workload_type

Type: Map

Cluster attributes showing cluster workload types.

Key

Type

Description

clients

Map

Defines what type of clients can use the cluster. See clients.

cluster.workload_type.clients

Type: Map

The type of clients for this compute workload.

Key

Type

Description

jobs

Boolean

Whether the cluster can run jobs.

notebooks

Boolean

Whether the cluster can run notebooks.

Examples

The following example creates a dedicated (single-user) cluster for the current user with Databricks Runtime 15.4 LTS and a cluster policy:

YAML
resources:
clusters:
my_cluster:
num_workers: 0
node_type_id: 'i3.xlarge'
driver_node_type_id: 'i3.xlarge'
spark_version: '15.4.x-scala2.12'
spark_conf:
'spark.executor.memory': '2g'
autotermination_minutes: 60
enable_elastic_disk: true
single_user_name: ${workspace.current_user.userName}
policy_id: '000128DB309672CA'
enable_local_disk_encryption: false
data_security_mode: SINGLE_USER
runtime_engine": STANDARD

This example creates a simple cluster my_cluster and sets that as the cluster to use to run the notebook in my_job:

YAML
bundle:
name: clusters

resources:
clusters:
my_cluster:
num_workers: 2
node_type_id: 'i3.xlarge'
autoscale:
min_workers: 2
max_workers: 7
spark_version: '13.3.x-scala2.12'
spark_conf:
'spark.executor.memory': '2g'

jobs:
my_job:
tasks:
- task_key: test_task
notebook_task:
notebook_path: './src/my_notebook.py'
existing_cluster_id: ${resources.clusters.my_cluster.id}

dashboard

Type: Map

The dashboard resource allows you to manage AI/BI dashboards in a bundle. For information about AI/BI dashboards, see Dashboards.

If you deployed a bundle that contains a dashboard from your local environment and then use the UI to modify that dashboard, modifications made through the UI are not applied to the dashboard JSON file in the local bundle unless you explicitly update it using bundle generate. You can use the --watch option to continuously poll and retrieve changes to the dashboard. See databricks bundle generate.

In addition, if you attempt to deploy a bundle from your local environment that contains a dashboard JSON file that is different than the one in the remote workspace, an error will occur. To force the deploy and overwrite the dashboard in the remote workspace with the local one, use the --force option. See databricks bundle deploy.

note

When using Databricks Asset Bundles with dashboard Git support, prevent duplicate dashboards from being generated by adding the sync mapping to exclude the dashboards from synchronizing as files:

YAML
sync:
exclude:
- src/*.lvdash.json
YAML
dashboards:
<dashboard-name>:
<dashboard-field-name>: <dashboard-field-value>

Key

Type

Description

display_name

String

The display name of the dashboard.

embed_credentials

Boolean

Whether the bundle deployment identity credentials are used to execute queries for all dashboard viewers. If it is set to false, a viewer's credentials are used. The default value is false.

etag

String

The etag for the dashboard. Can be optionally provided on updates to ensure that the dashboard has not been modified since the last read.

file_path

String

The local path of the dashboard asset, including the file name. Exported dashboards always have the file extension .lvdash.json.

permissions

Sequence

The dashboard permissions. See permissions.

serialized_dashboard

Any

The contents of the dashboard in serialized string form.

warehouse_id

String

The warehouse ID used to run the dashboard.

Example

The following example includes and deploys the sample NYC Taxi Trip Analysis dashboard to the Databricks workspace.

YAML
resources:
dashboards:
nyc_taxi_trip_analysis:
display_name: 'NYC Taxi Trip Analysis'
file_path: ../src/nyc_taxi_trip_analysis.lvdash.json
warehouse_id: ${var.warehouse_id}

database_catalog

Type: Map

The database catalog resource allows you to define database catalogs that correspond to database instances in a bundle. A database catalog is a Lakebase database that is registered as a Unity Catalog catalog.

For information about database catalogs, see Create a catalog.

YAML
database_catalogs:
<database_catalog-name>:
<database_catalog-field-name>: <database_catalog-field-value>

Key

Type

Description

create_database_if_not_exists

Boolean

Whether to create the database if it does not exist.

database_instance_name

String

The name of the instance housing the database.

database_name

String

The name of the database (in a instance) associated with the catalog.

lifecycle

Map

Contains the lifecycle settings for a resource, including the behavior of the resource when it is deployed or destroyed. See lifecycle.

name

String

The name of the catalog in Unity Catalog.

Example

The following example defines a database instance with a corresponding database catalog:

YAML
resources:
database_instances:
my_instance:
name: my-instance
capacity: CU_1
database_catalogs:
my_catalog:
database_instance_name: ${resources.database_instances.my_instance.name}
name: example_catalog
database_name: my_database
create_database_if_not_exists: true

database_instance

Type: Map

The database instance resource allows you to define database instances in a bundle. A Lakebase database instance manages storage and compute resources and provides the endpoints that users connect to.

important

When you deploy a bundle with a database instance, the instance immediately starts running and is subject to pricing. See Lakebase pricing.

For information about database instances, see What is a database instance?.

YAML
database_instances:
<database_instance-name>:
<database_instance-field-name>: <database_instance-field-value>

Key

Type

Description

capacity

String

The sku of the instance. Valid values are CU_1, CU_2, CU_4, CU_8.

custom_tags

Sequence

A list of key-value pairs that specify custom tags associated with the instance.

enable_pg_native_login

Boolean

Whether the instance has PG native password login enabled. Defaults to true.

enable_readable_secondaries

Boolean

Whether to enable secondaries to serve read-only traffic. Defaults to false.

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

name

String

The name of the instance. This is the unique identifier for the instance.

node_count

Integer

The number of nodes in the instance, composed of 1 primary and 0 or more secondaries. Defaults to 1 primary and 0 secondaries.

parent_instance_ref

Map

The reference of the parent instance. This is only available if the instance is child instance. See parent instance.

permissions

Sequence

The database instance's permissions. See permissions.

retention_window_in_days

Integer

The retention window for the instance. This is the time window in days for which the historical data is retained. The default value is 7 days. Valid values are 2 to 35 days.

stopped

Boolean

Whether the instance is stopped.

usage_policy_id

String

The desired usage policy to associate with the instance.

database_instance.parent_instance_ref

Type: Map

The reference of the parent instance. This is only available if the instance is child instance.

Key

Type

Description

branch_time

String

Branch time of the ref database instance. For a parent ref instance, this is the point in time on the parent instance from which the instance was created. For a child ref instance, this is the point in time on the instance from which the child instance was created.

lsn

String

User-specified WAL LSN of the ref database instance.

name

String

Name of the ref database instance.

Example

The following example defines a database instance with a corresponding database catalog:

YAML
resources:
database_instances:
my_instance:
name: my-instance
capacity: CU_1
database_catalogs:
my_catalog:
database_instance_name: ${resources.database_instances.my_instance.name}
name: example_catalog
database_name: my_database
create_database_if_not_exists: true

For an example bundle that demonstrates how to define a database instance and corresponding database catalog, see the bundle-examples GitHub repository.

experiment

Type: Map

The experiment resource allows you to define MLflow experiments in a bundle. For information about MLflow experiments, see Organize training runs with MLflow experiments.

YAML
experiments:
<experiment-name>:
<experiment-field-name>: <experiment-field-value>

Key

Type

Description

artifact_location

String

The location where artifacts for the experiment are stored.

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

name

String

The friendly name that identifies the experiment. An experiment name must be an absolute path in the Databricks workspace, for example /Workspace/Users/someone@example.com/my_experiment.

permissions

Sequence

The experiment's permissions. See permissions.

tags

Sequence

Additional metadata key-value pairs. See tags.

Example

The following example defines an experiment that all users can view:

YAML
resources:
experiments:
experiment:
name: /Workspace/Users/someone@example.com/my_experiment
permissions:
- level: CAN_READ
group_name: users
description: MLflow experiment used to track runs

job

Type: Map

Jobs are supported in Python for Databricks Asset Bundles. See databricks.bundles.jobs.

The job resource allows you to define jobs and their corresponding tasks in your bundle.

For information about jobs, see Lakeflow Jobs. For a tutorial that uses a Databricks Asset Bundles template to create a job, see Develop a job with Databricks Asset Bundles.

YAML
jobs:
<job-name>:
<job-field-name>: <job-field-value>

Key

Type

Description

budget_policy_id

String

The id of the user-specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job. See effective_budget_policy_id for the budget policy used by this workload.

continuous

Map

An optional continuous property for this job. The continuous property will ensure that there is always one run executing. Only one of schedule and continuous can be used. See continuous.

deployment

Map

Deployment information for jobs managed by external sources. See deployment.

description

String

An optional description for the job. The maximum length is 27700 characters in UTF-8 encoding.

edit_mode

String

Edit mode of the job, either UI_LOCKED or EDITABLE.

email_notifications

Map

An optional set of email addresses that is notified when runs of this job begin or complete as well as when this job is deleted. See email_notifications.

environments

Sequence

A list of task execution environment specifications that can be referenced by serverless tasks of this job. An environment is required to be present for serverless tasks. For serverless notebook tasks, the environment is accessible in the notebook environment panel. For other serverless tasks, the task environment is required to be specified using environment_key in the task settings.

format

String

The format of the job.

git_source

Map

An optional specification for a remote Git repository containing the source code used by tasks.

Important: The git_source field and task source field set to GIT are not recommended for bundles, because local relative paths may not point to the same content in the Git repository, and bundles expect that a deployed job has the same content as the local copy from where it was deployed.

Instead, clone the repository locally and set up your bundle project within this repository, so that the source for tasks are the workspace.

health

Map

An optional set of health rules that can be defined for this job. See health.

job_clusters

Sequence

A list of job cluster specifications that can be shared and reused by tasks of this job. See clusters.

max_concurrent_runs

Integer

An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently.

name

String

An optional name for the job. The maximum length is 4096 bytes in UTF-8 encoding.

notification_settings

Map

Optional notification settings that are used when sending notifications to each of the email_notifications and webhook_notifications for this job. See notification_settings.

parameters

Sequence

Job-level parameter definitions.

performance_target

String

Defines how performant or cost efficient the execution of the run on serverless should be.

permissions

Sequence

The job's permissions. See permissions.

queue

Map

The queue settings of the job. See queue.

run_as

Map

Write-only setting. Specifies the user or service principal that the job runs as. If not specified, the job runs as the user who created the job. Either user_name or service_principal_name should be specified. If not, an error is thrown. See run_as.

schedule

Map

An optional periodic schedule for this job. The default behavior is that the job only runs when triggered by clicking “Run Now” in the Jobs UI or sending an API request to runNow. See schedule.

tags

Map

A map of tags associated with the job. These are forwarded to the cluster as cluster tags for jobs clusters, and are subject to the same limitations as cluster tags. A maximum of 25 tags can be added to the job.

tasks

Sequence

A list of task specifications to be executed by this job. See Add tasks to jobs in Databricks Asset Bundles.

timeout_seconds

Integer

An optional timeout applied to each run of this job. A value of 0 means no timeout.

trigger

Map

A configuration to trigger a run when certain conditions are met. See trigger.

webhook_notifications

Map

A collection of system notification IDs to notify when runs of this job begin or complete. See webhook_notifications.

job.continuous

Type: Map

Configuration for continuous job execution.

Key

Type

Description

pause_status

String

Whether the continuous job is paused or not. Valid values: PAUSED, UNPAUSED.

job.deployment

Type: Map

Deployment information for jobs managed by external sources.

Key

Type

Description

kind

String

The kind of deployment. For example, BUNDLE.

metadata_file_path

String

The path to the metadata file for the deployment.

job.email_notifications

Type: Map

Email notification settings for job runs.

Key

Type

Description

on_start

Sequence

A list of email addresses to notify when a run starts.

on_success

Sequence

A list of email addresses to notify when a run succeeds.

on_failure

Sequence

A list of email addresses to notify when a run fails.

on_duration_warning_threshold_exceeded

Sequence

A list of email addresses to notify when a run duration exceeds the warning threshold.

no_alert_for_skipped_runs

Boolean

Whether to skip sending alerts for skipped runs.

job.git_source

Type: Map

Git repository configuration for job source code.

Key

Type

Description

git_url

String

The URL of the Git repository.

git_provider

String

The Git provider. Valid values: gitHub, bitbucketCloud, gitLab, azureDevOpsServices, gitHubEnterprise, bitbucketServer, gitLabEnterpriseEdition.

git_branch

String

The name of the Git branch to use.

git_tag

String

The name of the Git tag to use.

git_commit

String

The Git commit hash to use.

git_snapshot

Map

Used commit information. This is a read-only field. See git_snapshot.

job.git_source.git_snapshot

Type: Map

Read-only commit information snapshot.

Key

Type

Description

used_commit

String

The commit hash that was used.

job.health

Type: Map

Health monitoring configuration for the job.

Key

Type

Description

rules

Sequence

A list of job health rules. Each rule contains a metric and op (operator) and value. See JobsHealthRule.

JobsHealthRule

Type: Map

Key

Type

Description

metric

String

Specifies the health metric that is being evaluated for a particular health rule.

  • RUN_DURATION_SECONDS: Expected total time for a run in seconds.
  • STREAMING_BACKLOG_BYTES: An estimate of the maximum bytes of data waiting to be consumed across all streams. This metric is in Public Preview.
  • STREAMING_BACKLOG_RECORDS: An estimate of the maximum offset lag across all streams. This metric is in Public Preview.
  • STREAMING_BACKLOG_SECONDS: An estimate of the maximum consumer delay across all streams. This metric is in Public Preview.
  • STREAMING_BACKLOG_FILES: An estimate of the maximum number of outstanding files across all streams. This metric is in Public Preview.

op

String

Specifies the operator used to compare the health metric value with the specified threshold.

value

Integer

Specifies the threshold value that the health metric should obey to satisfy the health rule.

job.notification_settings

Type: Map

Notification settings that apply to all notifications for the job.

Key

Type

Description

no_alert_for_skipped_runs

Boolean

Whether to skip sending alerts for skipped runs.

no_alert_for_canceled_runs

Boolean

Whether to skip sending alerts for canceled runs.

job.queue

Type: Map

Queue settings for the job.

Key

Type

Description

enabled

Boolean

Whether to enable queueing for the job.

job.schedule

Type: Map

Schedule configuration for periodic job execution.

Key

Type

Description

quartz_cron_expression

String

A Cron expression using Quartz syntax that specifies when the job runs. For example, 0 0 9 * * ? runs the job every day at 9:00 AM UTC.

timezone_id

String

The timezone for the schedule. For example, America/Los_Angeles or UTC.

pause_status

String

Whether the schedule is paused or not. Valid values: PAUSED, UNPAUSED.

job.trigger

Type: Map

Trigger configuration for event-driven job execution.

Key

Type

Description

file_arrival

Map

Trigger based on file arrival. See file_arrival.

table

Map

Trigger based on a table. See table.

table_update

Map

Trigger based on table updates. See table_update.

periodic

Map

Periodic trigger. See periodic.

job.trigger.file_arrival

Type: Map

Trigger configuration based on file arrival.

Key

Type

Description

url

String

The file path to monitor for new files.

min_time_between_triggers_seconds

Integer

Minimum time in seconds between trigger events.

wait_after_last_change_seconds

Integer

Wait time in seconds after the last file change before triggering.

job.trigger.table

Type: Map

Trigger configuration based on a table.

Key

Type

Description

table_names

Sequence

A list of table names to monitor.

condition

String

The SQL condition that must be met to trigger the job.

job.trigger.table_update

Type: Map

Trigger configuration based on table updates.

Key

Type

Description

table_names

Sequence

A list of table names to monitor for updates.

condition

String

The SQL condition that must be met to trigger the job.

wait_after_last_change_seconds

Integer

Wait time in seconds after the last table update before triggering.

job.trigger.periodic

Type: Map

Periodic trigger configuration.

Key

Type

Description

interval

Integer

The interval value for the periodic trigger.

unit

String

The unit of time for the interval. Valid values: SECONDS, MINUTES, HOURS, DAYS, WEEKS.

job.webhook_notifications

Type: Map

Webhook notification settings for job runs.

Key

Type

Description

on_start

Sequence

A list of webhook notification IDs to notify when a run starts.

on_success

Sequence

A list of webhook notification IDs to notify when a run succeeds.

on_failure

Sequence

A list of webhook notification IDs to notify when a run fails.

on_duration_warning_threshold_exceeded

Sequence

A list of webhook notification IDs to notify when a run duration exceeds the warning threshold.

Examples

The following example defines a job with the resource key hello-job with one notebook task:

YAML
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
notebook_task:
notebook_path: ./hello.py

The following example defines a job with a SQL notebook:

YAML
resources:
jobs:
job_with_sql_notebook:
name: 'Job to demonstrate using a SQL notebook with a SQL warehouse'
tasks:
- task_key: notebook
notebook_task:
notebook_path: ./select.sql
warehouse_id: 799f096837fzzzz4

For additional job configuration examples, see Job configuration.

For information about defining job tasks and overriding job settings, see:

model (legacy)

Type: Map

The model resource allows you to define legacy models in bundles. Databricks recommends you use Unity Catalog registered models instead.

model_serving_endpoint

Type: Map

The model_serving_endpoint resource allows you to define model serving endpoints. See Manage model serving endpoints.

YAML
model_serving_endpoints:
<model_serving_endpoint-name>:
<model_serving_endpoint-field-name>: <model_serving_endpoint-field-value>

Key

Type

Description

ai_gateway

Map

The AI Gateway configuration for the serving endpoint. NOTE: Only external model and provisioned throughput endpoints are currently supported. See ai_gateway.

config

Map

The core config of the serving endpoint. See config.

name

String

The name of the serving endpoint. This field is required and must be unique across a Databricks workspace. An endpoint name can consist of alphanumeric characters, dashes, and underscores.

permissions

Sequence

The model serving endpoint's permissions. See permissions.

rate_limits

Sequence

Deprecated. Rate limits to be applied to the serving endpoint. Use AI Gateway to manage rate limits.

route_optimized

Boolean

Enable route optimization for the serving endpoint.

tags

Sequence

Tags to be attached to the serving endpoint and automatically propagated to billing logs.

model_serving_endpoint.ai_gateway

Type: Map

AI Gateway configuration for the serving endpoint.

Key

Type

Description

guardrails

Map

Guardrail configuration. See guardrails.

inference_table_config

Map

Configuration for inference logging to Unity Catalog tables. See inference_table_config.

rate_limits

Sequence

Rate limit configurations.

usage_tracking_config

Map

Configuration for tracking usage. See usage_tracking_config.

model_serving_endpoint.ai_gateway.guardrails

Type: Map

The AI gateway guardrails configuration.

Key

Type

Description

input

Map

Input guardrails configuration with fields like safety, pii.

output

Map

Output guardrails configuration with fields like safety, pii.

invalid_keywords

Sequence

A list of keywords to block.

model_serving_endpoint.ai_gateway.inference_table_config

Type: Map

Configuration for inference logging to Unity Catalog tables.

Key

Type

Description

catalog_name

String

The name of the catalog in Unity Catalog.

schema_name

String

The name of the schema in Unity Catalog.

table_name_prefix

String

The prefix for inference table names.

enabled

Boolean

Whether inference table logging is enabled.

model_serving_endpoint.ai_gateway.usage_tracking_config

Type: Map

The AI gateway configuration for tracking usage.

Key

Type

Description

enabled

Boolean

Whether usage tracking is enabled.

model_serving_endpoint.config

Type: Map

The core configuration of the serving endpoint.

Key

Type

Description

served_entities

Sequence

A list of served entities for the endpoint to serve. Each served entity contains fields like entity_name, entity_version, workload_size, scale_to_zero_enabled, workload_type, environment_vars.

served_models

Sequence

(Deprecated: use served_entities instead) A list of served models for the endpoint to serve.

traffic_config

Map

The traffic config defining how invocations to the serving endpoint should be routed. See traffic_config.

auto_capture_config

Map

Configuration for Inference Tables which automatically logs requests and responses to Unity Catalog. See auto_capture_config.

model_serving_endpoint.config.traffic_config

Type: Map

The traffic config defining how invocations to the serving endpoint should be routed.

Key

Type

Description

routes

Sequence

A list of routes for traffic distribution. Each route contains served_model_name and traffic_percentage.

model_serving_endpoint.config.auto_capture_config

Type: Map

Configuration for Inference Tables which automatically logs requests and responses to Unity Catalog.

Key

Type

Description

catalog_name

String

The name of the catalog in Unity Catalog.

schema_name

String

The name of the schema in Unity Catalog.

table_name_prefix

String

The prefix for inference table names.

enabled

Boolean

Whether inference table logging is enabled.

Example

The following example defines a Unity Catalog model serving endpoint:

YAML
resources:
model_serving_endpoints:
uc_model_serving_endpoint:
name: 'uc-model-endpoint'
config:
served_entities:
- entity_name: 'myCatalog.mySchema.my-ads-model'
entity_version: '10'
workload_size: 'Small'
scale_to_zero_enabled: 'true'
traffic_config:
routes:
- served_model_name: 'my-ads-model-10'
traffic_percentage: '100'
tags:
- key: 'team'
value: 'data science'

pipeline

Type: Map

Pipelines are supported in Python for Databricks Asset Bundles. See databricks.bundles.pipelines.

The pipeline resource allows you to create Lakeflow Spark Declarative Pipelines pipelines. For information about pipelines, see Lakeflow Spark Declarative Pipelines. For a tutorial that uses the Databricks Asset Bundles template to create a pipeline, see Develop Lakeflow Spark Declarative Pipelines with Databricks Asset Bundles.

YAML
pipelines:
<pipeline-name>:
<pipeline-field-name>: <pipeline-field-value>

Key

Type

Description

allow_duplicate_names

Boolean

If false, deployment will fail if name conflicts with that of another pipeline.

budget_policy_id

String

Budget policy of this pipeline.

catalog

String

A catalog in Unity Catalog to publish data from this pipeline to. If target is specified, tables in this pipeline are published to a target schema inside catalog (for example, catalog.target.table). If target is not specified, no data is published to Unity Catalog.

channel

String

The Lakeflow Spark Declarative Pipelines Release Channel that specifies which version of Lakeflow Spark Declarative Pipelines to use.

clusters

Sequence

The cluster settings for this pipeline deployment. See cluster.

configuration

Map

The configuration for this pipeline execution.

continuous

Boolean

Whether the pipeline is continuous or triggered. This replaces trigger.

deployment

Map

Deployment type of this pipeline. See deployment.

development

Boolean

Whether the pipeline is in development mode. Defaults to false.

dry_run

Boolean

Whether the pipeline is a dry run pipeline.

edition

String

The pipeline product edition.

environment

Map

The environment specification for this pipeline used to install dependencies on serverless compute. This key is only supported in Databricks CLI version 0.258 and above.

event_log

Map

The event log configuration for this pipeline. See event_log.

filters

Map

The filters that determine which pipeline packages to include in the deployed graph. See filters.

id

String

Unique identifier for this pipeline.

ingestion_definition

Map

The configuration for a managed ingestion pipeline. These settings cannot be used with the libraries, schema, target, or catalog settings. See ingestion_definition.

libraries

Sequence

A list of libraries or code needed by this deployment. See PipelineLibrary.

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

name

String

A friendly name for this pipeline.

notifications

Sequence

The notification settings for this pipeline.

permissions

Sequence

The pipeline's permissions. See permissions.

photon

Boolean

Whether Photon is enabled for this pipeline.

root_path

String

The root path for this pipeline. This is used as the root directory when editing the pipeline in the Databricks user interface and it is added to sys.path when executing Python sources during pipeline execution.

run_as

Map

The identity that the pipeline runs as. If not specified, the pipeline runs as the user who created the pipeline. Only user_name or service_principal_name can be specified. If both are specified, an error is thrown. See run_as.

schema

String

The default schema (database) where tables are read from or published to.

serverless

Boolean

Whether serverless compute is enabled for this pipeline.

storage

String

The DBFS root directory for storing checkpoints and tables.

tags

Map

A map of tags associated with the pipeline. These are forwarded to the cluster as cluster tags, and are therefore subject to the same limitations. A maximum of 25 tags can be added to the pipeline.

target

String

Target schema (database) to add tables in this pipeline to. Exactly one of schema or target must be specified. To publish to Unity Catalog, also specify catalog. This legacy field is deprecated for pipeline creation in favor of the schema field.

pipeline.deployment

Type: Map

Deployment type configuration for the pipeline.

Key

Type

Description

kind

String

The kind of deployment. For example, BUNDLE.

metadata_file_path

String

The path to the metadata file for the deployment.

pipeline.environment

Type: Map

Environment specification for installing dependencies on serverless compute.

Key

Type

Description

spec

Map

The specification for the environment. See spec.

pipeline.environment.spec

Type: Map

The specification for the environment.

Key

Type

Description

client

String

The client version (for example, 1 or 2).

dependencies

Sequence

A list of dependencies to install (for example, numpy, pandas==1.5.0).

pipeline.event_log

Type: Map

Event log configuration for the pipeline.

Key

Type

Description

enabled

Boolean

Whether event logging is enabled.

storage_location

String

The storage location for event logs.

pipeline.filters

Type: Map

Filters that determine which pipeline packages to include in the deployed graph.

Key

Type

Description

include

Sequence

A list of package names to include.

exclude

Sequence

A list of package names to exclude.

pipeline.ingestion_definition

Type: Map

Configuration for a managed ingestion pipeline.

Key

Type

Description

connection_name

String

The name of the connection to use for ingestion.

ingestion_gateway_id

String

The ID of the ingestion gateway.

objects

Sequence

A list of objects to ingest. Each object can be a SchemaSpec, TableSpec, or ReportSpec. See SchemaSpec, TableSpec, and ReportSpec.

table_configuration

Map

Configuration for the ingestion tables. See table_configuration.

SchemaSpec

Type: Map

Schema object specification for ingesting all tables from a schema.

Key

Type

Description

source_schema

String

The name of the source schema to ingest.

destination_catalog

String

The name of the destination catalog in Unity Catalog.

destination_schema

String

The name of the destination schema in Unity Catalog.

table_configuration

Map

Configuration to apply to all tables in this schema. See pipeline.ingestion_definition.table_configuration.

TableSpec

Type: Map

Table object specification for ingesting a specific table.

Key

Type

Description

source_schema

String

The name of the source schema containing the table.

source_table

String

The name of the source table to ingest.

destination_catalog

String

The name of the destination catalog in Unity Catalog.

destination_schema

String

The name of the destination schema in Unity Catalog.

destination_table

String

The name of the destination table in Unity Catalog.

table_configuration

Map

Configuration for this specific table. See pipeline.ingestion_definition.table_configuration.

ReportSpec

Type: Map

Report object specification for ingesting analytics reports.

Key

Type

Description

source_url

String

The URL of the source report.

source_report

String

The name or identifier of the source report.

destination_catalog

String

The name of the destination catalog in Unity Catalog.

destination_schema

String

The name of the destination schema in Unity Catalog.

destination_table

String

The name of the destination table for the report data.

table_configuration

Map

Configuration for the report table. See pipeline.ingestion_definition.table_configuration.

pipeline.ingestion_definition.table_configuration

Type: Map

Configuration options for ingestion tables.

Key

Type

Description

primary_keys

Sequence

A list of column names to use as primary keys for the table.

salesforce_include_formula_fields

Boolean

Whether to include Salesforce formula fields in the ingestion.

scd_type

String

The type of slowly changing dimension (SCD) to apply. Valid values: SCD_TYPE_1, SCD_TYPE_2.

PipelineLibrary

Type: Map

Defines a library or code needed by this pipeline.

Key

Type

Description

file

Map

The path to a file that defines a pipeline and is stored in Databricks Repos. See pipeline.libraries.file.

glob

Map

The unified field to include source code. Each entry can be a notebook path, a file path, or a folder path that ends /**. This field cannot be used together with notebook or file. See pipeline.libraries.glob.

notebook

Map

The path to a notebook that defines a pipeline and is stored in the Databricks workspace. See pipeline.libraries.notebook.

whl

String

This field is deprecated

pipeline.libraries.file

Type: Map

The path to a file that defines a pipeline and is stored in the Databricks Repos.

Key

Type

Description

path

String

The absolute path of the source code.

pipeline.libraries.glob

Type: Map

The unified field to include source code. Each entry can be a notebook path, a file path, or a folder path that ends /**. This field cannot be used together with notebook or file.

Key

Type

Description

include

String

The source code to include for pipelines

pipeline.libraries.notebook

Type: Map

The path to a notebook that defines a pipeline and is stored in the Databricks workspace.

Key

Type

Description

path

String

The absolute path of the source code.

Example

The following example defines a pipeline with the resource key hello-pipeline:

YAML
resources:
pipelines:
hello-pipeline:
name: hello-pipeline
clusters:
- label: default
num_workers: 1
development: true
continuous: false
channel: CURRENT
edition: CORE
photon: false
libraries:
- notebook:
path: ./pipeline.py

For additional pipeline configuration examples, see Pipeline configuration.

quality_monitor (Unity Catalog)

Type: Map

The quality_monitor resource allows you to define a Unity Catalog table monitor. For information about monitors, see Data profiling.

YAML
quality_monitors:
<quality_monitor-name>:
<quality_monitor-field-name>: <quality_monitor-field-value>

Key

Type

Description

assets_dir

String

The directory to store monitoring assets (e.g. dashboard, metric tables).

baseline_table_name

String

Name of the baseline table from which drift metrics are computed from. Columns in the monitored table should also be present in the baseline table.

custom_metrics

Sequence

Custom metrics to compute on the monitored table. These can be aggregate metrics, derived metrics (from already computed aggregate metrics), or drift metrics (comparing metrics across time windows). See custom_metrics.

inference_log

Map

Configuration for monitoring inference logs. See inference_log.

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

notifications

Map

The notification settings for the monitor. See notifications.

output_schema_name

String

Schema where output metric tables are created.

schedule

Map

The schedule for automatically updating and refreshing metric tables. See schedule.

skip_builtin_dashboard

Boolean

Whether to skip creating a default dashboard summarizing data quality metrics.

slicing_exprs

Sequence

List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For high-cardinality columns, only the top 100 unique values by frequency will generate slices.

snapshot

Map

Configuration for monitoring snapshot tables. See snapshot.

table_name

String

The full name of the table.

time_series

Map

Configuration for monitoring time series tables. See time_series.

warehouse_id

String

Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used.

quality_monitor.custom_metrics

Type: Sequence

Key

Type

Description

definition

String

Jinja template for a SQL expression that specifies how to compute the metric. See create metric definition.

input_columns

Sequence

A list of column names in the input table the metric should be computed for. Can use :table to indicate that the metric needs information from multiple columns.

name

String

Name of the metric in the output tables.

output_data_type

String

The output type of the custom metric.

type

String

Can only be one of CUSTOM_METRIC_TYPE_AGGREGATE, CUSTOM_METRIC_TYPE_DERIVED, or CUSTOM_METRIC_TYPE_DRIFT. The CUSTOM_METRIC_TYPE_AGGREGATE and CUSTOM_METRIC_TYPE_DERIVED metrics are computed on a single table, whereas the CUSTOM_METRIC_TYPE_DRIFT compare metrics across baseline and input table, or across the two consecutive time windows.

  • CUSTOM_METRIC_TYPE_AGGREGATE: only depend on the existing columns in your table
  • CUSTOM_METRIC_TYPE_DERIVED: depend on previously computed aggregate metrics
  • CUSTOM_METRIC_TYPE_DRIFT: depend on previously computed aggregate or derived metrics

quality_monitor.data_classification_config

Type: Map

Configuration for data classification.

Key

Type

Description

enabled

Boolean

Whether data classification is enabled.

quality_monitor.inference_log

Type: Map

Configuration for monitoring inference logs.

Key

Type

Description

granularities

Sequence

The time granularities for aggregating inference logs (for example, ["1 day"]).

model_id_col

String

The name of the column containing the model ID.

prediction_col

String

The name of the column containing the prediction.

timestamp_col

String

The name of the column containing the timestamp.

problem_type

String

The type of ML problem. Valid values include PROBLEM_TYPE_CLASSIFICATION, PROBLEM_TYPE_REGRESSION.

label_col

String

The name of the column containing the label (ground truth).

quality_monitor.notifications

Type: Map

Notification settings for the monitor.

Key

Type

Description

on_failure

Map

Notification settings when the monitor fails. See on_failure.

on_new_classification_tag_detected

Map

Notification settings when new classification tags are detected. See on_new_classification_tag_detected.

quality_monitor.notifications.on_failure

Type: Map

Notification settings when the monitor fails.

Key

Type

Description

email_addresses

Sequence

A list of email addresses to notify on monitor failure.

quality_monitor.notifications.on_new_classification_tag_detected

Type: Map

Notification settings when new classification tags are detected.

Key

Type

Description

email_addresses

Sequence

A list of email addresses to notify when new classification tags are detected.

quality_monitor.schedule

Type: Map

Schedule for automatically updating and refreshing metric tables.

Key

Type

Description

quartz_cron_expression

String

A Cron expression using Quartz syntax. For example, 0 0 8 * * ? runs every day at 8:00 AM.

timezone_id

String

The timezone for the schedule (for example, UTC, America/Los_Angeles).

pause_status

String

Whether the schedule is paused. Valid values: PAUSED, UNPAUSED.

quality_monitor.snapshot

Type: Map

Configuration for monitoring snapshot tables.

quality_monitor.time_series

Configuration for monitoring time series tables.

Key

Type

Description

granularities

Sequence

The time granularities for aggregating time series data (for example, ["30 minutes"]).

timestamp_col

String

The name of the column containing the timestamp.

Examples

For a complete example bundle that defines a quality_monitor, see the mlops_demo bundle.

The following examples define quality monitors for InferenceLog, TimeSeries, and Snapshot profile types.

YAML
# InferenceLog profile type
resources:
quality_monitors:
my_quality_monitor:
table_name: dev.mlops_schema.predictions
output_schema_name: ${bundle.target}.mlops_schema
assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
inference_log:
granularities: [1 day]
model_id_col: model_id
prediction_col: prediction
label_col: price
problem_type: PROBLEM_TYPE_REGRESSION
timestamp_col: timestamp
schedule:
quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
timezone_id: UTC
YAML
# TimeSeries profile type
resources:
quality_monitors:
my_quality_monitor:
table_name: dev.mlops_schema.predictions
output_schema_name: ${bundle.target}.mlops_schema
assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
time_series:
granularities: [30 minutes]
timestamp_col: timestamp
schedule:
quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
timezone_id: UTC
YAML
# Snapshot profile type
resources:
quality_monitors:
my_quality_monitor:
table_name: dev.mlops_schema.predictions
output_schema_name: ${bundle.target}.mlops_schema
assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
snapshot: {}
schedule:
quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
timezone_id: UTC

registered_model (Unity Catalog)

Type: Map

The registered model resource allows you to define models in Unity Catalog. For information about Unity Catalog registered models, see Manage model lifecycle in Unity Catalog.

YAML
registered_models:
<registered_model-name>:
<registered_model-field-name>: <registered_model-field-value>

Key

Type

Description

aliases

Sequence

List of aliases associated with the registered model. See registered_model.aliases.

browse_only

Boolean

Indicates whether the principal is limited to retrieving metadata for the associated object through the BROWSE privilege when include_browse is enabled in the request.

catalog_name

String

The name of the catalog where the schema and the registered model reside.

comment

String

The comment attached to the registered model.

full_name

String

The three-level (fully qualified) name of the registered model

grants

Sequence

The grants associated with the registered model. See grant.

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

name

String

The name of the registered model.

schema_name

String

The name of the schema where the registered model resides.

storage_location

String

The storage location on the cloud under which model version data files are stored.

registered_model.aliases

Type: Sequence

List of aliases associated with the registered model

Key

Type

Description

alias_name

String

Name of the alias, e.g. 'champion' or 'latest_stable'

catalog_name

String

The name of the catalog containing the model version

id

String

The unique identifier of the alias

model_name

String

The name of the parent registered model of the model version, relative to parent schema

schema_name

String

The name of the schema containing the model version, relative to parent catalog

version_num

Integer

Integer version number of the model version to which this alias points.

Example

The following example defines a registered model in Unity Catalog:

YAML
resources:
registered_models:
model:
name: my_model
catalog_name: ${bundle.target}
schema_name: mlops_schema
comment: Registered model in Unity Catalog for ${bundle.target} deployment target
grants:
- privileges:
- EXECUTE
principal: account users

schema (Unity Catalog)

Type: Map

Schemas are supported in Python for Databricks Asset Bundles. See databricks.bundles.schemas.

The schema resource type allows you to define Unity Catalog schemas for tables and other assets in your workflows and pipelines created as part of a bundle. A schema, different from other resource types, has the following limitations:

  • The owner of a schema resource is always the deployment user, and cannot be changed. If run_as is specified in the bundle, it will be ignored by operations on the schema.
  • Only fields supported by the corresponding Schemas object create API are available for the schema resource. For example, enable_predictive_optimization is not supported as it is only available on the update API.
YAML
schemas:
<schema-name>:
<schema-field-name>: <schema-field-value>

Key

Type

Description

catalog_name

String

The name of the parent catalog.

comment

String

A user-provided free-form text description.

grants

Sequence

The grants associated with the schema. See grant.

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

name

String

The name of schema, relative to the parent catalog.

properties

Map

A map of key-value properties attached to the schema.

storage_root

String

The storage root URL for managed tables within the schema.

Examples

The following example defines a pipeline with the resource key my_pipeline that creates a Unity Catalog schema with the key my_schema as the target:

YAML
resources:
pipelines:
my_pipeline:
name: test-pipeline-{{.unique_id}}
libraries:
- notebook:
path: ../src/nb.ipynb
- file:
path: ../src/range.sql
development: true
catalog: ${resources.schemas.my_schema.catalog_name}
target: ${resources.schemas.my_schema.id}

schemas:
my_schema:
name: test-schema-{{.unique_id}}
catalog_name: main
comment: This schema was created by Databricks Asset Bundles.

A top-level grants mapping is not supported by Databricks Asset Bundles, so if you want to set grants for a schema, define the grants for the schema within the schemas mapping. For more information about grants, see Show, grant, and revoke privileges.

The following example defines a Unity Catalog schema with grants:

YAML
resources:
schemas:
my_schema:
name: test-schema
grants:
- principal: users
privileges:
- SELECT
- principal: my_team
privileges:
- CAN_MANAGE
catalog_name: main

secret_scope

Type: Map

The secret_scope resource allows you to define secret scopes in a bundle. For information about secret scopes, see Secret management.

YAML
secret_scopes:
<secret_scope-name>:
<secret_scope-field-name>: <secret_scope-field-value>

Key

Type

Description

backend_type

String

The backend type the scope will be created with. If not specified, this defaults to DATABRICKS.

keyvault_metadata

Map

The metadata for the secret scope if the backend_type is AZURE_KEYVAULT. See keyvault_metadata.

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

name

String

Scope name requested by the user. Scope names are unique.

permissions

Sequence

The permissions to apply to the secret scope. Permissions are managed via secret scope ACLs. See permissions.

secret_scope.keyvault_metadata

Type: Map

The metadata for Azure Key Vault-backed secret scopes.

Key

Type

Description

resource_id

String

The Azure resource ID of the Key Vault.

dns_name

String

The DNS name of the Azure Key Vault.

Examples

The following example defines a secret scope that uses a key vault backend:

YAML
resources:
secret_scopes:
secret_scope_azure:
name: test-secrets-azure-backend
backend_type: 'AZURE_KEYVAULT'
keyvault_metadata:
resource_id: my_azure_keyvault_id
dns_name: my_azure_keyvault_dns_name

The following example sets a custom ACL using secret scopes and permissions:

YAML
resources:
secret_scopes:
my_secret_scope:
name: my_secret_scope
permissions:
- user_name: admins
level: WRITE
- user_name: users
level: READ

For an example bundle that demonstrates how to define a secret scope and a job with a task that reads from it in a bundle, see the bundle-examples GitHub repository.

sql_warehouse

Type: Map

The SQL warehouse resource allows you to define a SQL warehouse in a bundle. For information about SQL warehouses, see Data warehousing on Databricks.

YAML
sql_warehouses:
<sql-warehouse-name>:
<sql-warehouse-field-name>: <sql-warehouse-field-value>

Key

Type

Description

auto_stop_mins

Integer

The amount of time in minutes that a SQL warehouse must be idle (for example, no RUNNING queries), before it is automatically stopped. Valid values are 0, which indicates no autostop, or greater than or equal to 10. The default is 120.

channel

Map

The channel details. See channel

cluster_size

String

The size of the clusters allocated for this warehouse. Increasing the size of a Spark cluster allows you to run larger queries on it. If you want to increase the number of concurrent queries, tune max_num_clusters. For supported values, see cluster_size.

creator_name

String

The name of the user that created the warehouse.

enable_photon

Boolean

Whether the warehouse should use Photon optimized clusters. Defaults to false.

enable_serverless_compute

Boolean

Whether the warehouse should use serverless compute.

instance_profile_arn

String

Deprecated. Instance profile used to pass IAM role to the cluster,

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

max_num_clusters

Integer

The maximum number of clusters that the autoscaler will create to handle concurrent queries. Values must be less than or equal to 30 and greater than or equal to min_num_clusters. Defaults to min_clusters if unset.

min_num_clusters

Integer

The minimum number of available clusters that will be maintained for this SQL warehouse. Increasing this will ensure that a larger number of clusters are always running and therefore may reduce the cold start time for new queries. This is similar to reserved vs. revocable cores in a resource manager. Values must be greater than 0 and less than or equal to min(max_num_clusters, 30). Defaults to 1.

name

String

The logical name for the cluster. The name must be unique within an org and less than 100 characters.

permissions

Sequence

The permissions to apply to the warehouse. See permissions.

spot_instance_policy

String

Whether to use spot instances. Valid values are POLICY_UNSPECIFIED, COST_OPTIMIZED, RELIABILITY_OPTIMIZED. The default is COST_OPTIMIZED.

tags

Map

A set of key-value pairs that will be tagged on all resources (e.g., AWS instances and EBS volumes) associated with this SQL warehouse. The number of tags must be less than 45.

warehouse_type

String

The warehouse type, PRO or CLASSIC. If you want to use serverless compute, set this field to PRO and also set the field enable_serverless_compute to true.

sql_warehouse.channel

Type: Map

The channel configuration for the SQL warehouse.

Key

Type

Description

name

String

The name of the channel. Valid values include CHANNEL_NAME_CURRENT, CHANNEL_NAME_PREVIEW, CHANNEL_NAME_CUSTOM.

dbsql_version

String

The DBSQL version for custom channels.

Example

The following example defines a SQL warehouse:

YAML
resources:
sql_warehouses:
my_sql_warehouse:
name: my_sql_warehouse
cluster_size: X-Large
enable_serverless_compute: true
max_num_clusters: 3
min_num_clusters: 1
auto_stop_mins: 60
warehouse_type: PRO

synced_database_table

Type: Map

The synced database table resource allows you to define Lakebase database tables in a bundle.

For information about synced database tables, see What is a database instance?.

YAML
synced_database_tables:
<synced_database_table-name>:
<synced_database_table-field-name>: <synced_database_table-field-value>

Key

Type

Description

database_instance_name

String

The name of the target database instance. This is required when creating synced database tables in standard catalogs. This is optional when creating synced database tables in registered catalogs.

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

logical_database_name

String

The name of the target Postgres database object (logical database) for this table.

name

String

The full name of the table, in the form catalog.schema.table.

spec

Map

The database table specification. See synced database table specification.

synced_database_table.spec

Type: Map

The database table specification.

Key

Type

Description

create_database_objects_if_missing

Boolean

Whether to create the synced table's logical database and schema resources if they do not already exist.

existing_pipeline_id

String

The ID for an existing pipeline. If this is set, the synced table will be bin packed into the existing pipeline referenced. This avoids creating a new pipeline and allows sharing existing compute. In this case, the scheduling_policy of this synced table must match the scheduling policy of the existing pipeline. At most one of existing_pipeline_id and new_pipeline_spec should be defined.

new_pipeline_spec

Map

The specification for a new pipeline. See new_pipeline_spec. At most one of existing_pipeline_id and new_pipeline_spec should be defined.

primary_key_columns

Sequence

The list of column names that form the primary key.

scheduling_policy

String

The scheduling policy for syncing. Valid values include SNAPSHOT, CONTINUOUS.

source_table_full_name

String

The full name of the source table in the format catalog.schema.table.

timeseries_key

String

Time series key to de-duplicate rows with the same primary key.

synced_database_table.spec.new_pipeline_spec

Type: Map

The specification for a new pipeline used by the synced database table.

Key

Type

Description

storage_catalog

String

The catalog for the pipeline to store intermediate files, such as checkpoints and event logs. This needs to be a standard catalog where the user has permissions to create Delta tables.

storage_schema

String

The schema for the pipeline to store intermediate files, such as checkpoints and event logs. This needs to be in the standard catalog where the user has permissions to create Delta tables.

Examples

The following example defines a synced database table within a corresponding database catalog:

YAML
resources:
database_instances:
my_instance:
name: my-instance
capacity: CU_1
database_catalogs:
my_catalog:
database_instance_name: my-instance
database_name: 'my_database'
name: my_catalog
create_database_if_not_exists: true
synced_database_tables:
my_synced_table:
name: ${resources.database_catalogs.my_catalog.name}.${resources.database_catalogs.my_catalog.database_name}.my_destination_table
database_instance_name: ${resources.database_catalogs.my_catalog.database_instance_name}
logical_database_name: ${resources.database_catalogs.my_catalog.database_name}
spec:
source_table_full_name: 'my_source_table'
scheduling_policy: SNAPSHOT
primary_key_columns:
- my_pk_column
new_pipeline_spec:
storage_catalog: 'my_delta_catalog'
storage_schema: 'my_delta_schema'

The following example defines a synced database table inside a standard catalog:

YAML
resources:
synced_database_tables:
my_synced_table:
name: 'my_standard_catalog.public.synced_table'
# database_instance_name is required for synced tables created in standard catalogs.
database_instance_name: 'my-database-instance'
# logical_database_name is required for synced tables created in standard catalogs:
logical_database_name: ${resources.database_catalogs.my_catalog.database_name}
spec:
source_table_full_name: 'source_catalog.schema.table'
scheduling_policy: SNAPSHOT
primary_key_columns:
- my_pk_column
create_database_objects_if_missing: true
new_pipeline_spec:
storage_catalog: 'my_delta_catalog'
storage_schema: 'my_delta_schema'

This example creates a synced database table and customizes the pipeline schedule for it. It assumes you already have:

  • A database instance named my-database-instance
  • A standard catalog named my_standard_catalog
  • A schema in the standard catalog named default
  • A source delta table named source_delta.schema.customer with the primary key c_custkey
YAML
resources:
synced_database_tables:
my_synced_table:
name: 'my_standard_catalog.default.my_synced_table'
database_instance_name: 'my-database-instance'
logical_database_name: 'test_db'
spec:
source_table_full_name: 'source_delta.schema.customer'
scheduling_policy: SNAPSHOT
primary_key_columns:
- c_custkey
create_database_objects_if_missing: true
new_pipeline_spec:
storage_catalog: 'source_delta'
storage_schema: 'schema'

jobs:
sync_pipeline_schedule_job:
name: sync_pipeline_schedule_job
description: 'Job to schedule synced database table pipeline.'
tasks:
- task_key: synced-table-pipeline
pipeline_task:
pipeline_id: ${resources.synced_database_tables.my_synced_table.data_synchronization_status.pipeline_id}
schedule:
quartz_cron_expression: '0 0 0 * * ?'

volume (Unity Catalog)

Type: Map

Volumes are supported in Python for Databricks Asset Bundles. See databricks.bundles.volumes.

The volume resource type allows you to define and create Unity Catalog volumes as part of a bundle. When deploying a bundle with a volume defined, note that:

  • A volume cannot be referenced in the artifact_path for the bundle until it exists in the workspace. Hence, if you want to use Databricks Asset Bundles to create the volume, you must first define the volume in the bundle, deploy it to create the volume, then reference it in the artifact_path in subsequent deployments.
  • Volumes in the bundle are not prepended with the dev_${workspace.current_user.short_name} prefix when the deployment target has mode: development configured. However, you can manually configure this prefix. See Custom presets.
YAML
volumes:
<volume-name>:
<volume-field-name>: <volume-field-value>

Key

Type

Description

catalog_name

String

The name of the catalog of the schema and volume.

comment

String

The comment attached to the volume.

grants

Sequence

The grants associated with the volume. See grant.

lifecycle

Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.

name

String

The name of the volume.

schema_name

String

The name of the schema where the volume is.

storage_location

String

The storage location on the cloud.

volume_type

String

The volume type, either EXTERNAL or MANAGED. An external volume is located in the specified external location. A managed volume is located in the default location which is specified by the parent schema, or the parent catalog, or the metastore. See Managed versus external volumes.

Example

The following example creates a Unity Catalog volume with the key my_volume_id:

YAML
resources:
volumes:
my_volume_id:
catalog_name: main
name: my_volume
schema_name: my_schema

For an example bundle that runs a job that writes to a file in Unity Catalog volume, see the bundle-examples GitHub repository.

Common objects

grant

Type: Map

Defines the principal and privileges to grant to that principal. For more information about grants, see Show, grant, and revoke privileges.

Key

Type

Description

principal

String

The name of the principal that will be granted privileges. This can be a user, group, or service principal.

privileges

Sequence

The privileges to grant to the specified entity. Valid values depend on the resource type (for example, SELECT, MODIFY, CREATE, USAGE, READ_FILES, WRITE_FILES, EXECUTE, ALL_PRIVILEGES).

Example

The following example defines a Unity Catalog schema with grants:

YAML
resources:
schemas:
my_schema:
name: test-schema
grants:
- principal: users
privileges:
- SELECT
- principal: my_team
privileges:
- CAN_MANAGE
catalog_name: main

lifecycle

Type: Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed.

Key

Type

Description

prevent_destroy

Boolean

Lifecycle setting to prevent the resource from being destroyed.