Databricks Asset Bundles resources
Databricks Asset Bundles allows you to specify information about the Databricks resources used by the bundle in the resources
mapping in the bundle configuration. See resources mapping and resources key reference.
This article outlines supported resource types for bundles and provides details and an example for each supported type. For additional examples, see Bundle configuration examples.
To generate YAML for any existing resource, use the databricks bundle generate
command. See Generate a bundle configuration file.
Supported resources
The following table lists supported resource types for bundles. Some resources can be created by defining them in a bundle and deploying the bundle, and some resources can only be created by referencing an existing asset to include in the bundle.
Resources are defined using the corresponding Databricks REST API object’s create operation request payload, where the object’s supported fields, expressed as YAML, are the resource’s supported properties. Links to documentation for each resource’s corresponding payloads are listed in the table.
The databricks bundle validate
command returns warnings if unknown resource properties are found in bundle configuration files.
Resource | Corresponding REST API object |
---|---|
registered_model (Unity Catalog) | |
schema (Unity Catalog) | |
volume (Unity Catalog) |
app
Type: Map
The app resource defines a Databricks app. For information about Databricks Apps, see What is Databricks Apps?.
To add an app, specify the settings to define the app, including the required source_code_path
.
You can initialize a bundle with a Streamlit Databricks app using the following command:
databricks bundle init https://github.com/databricks/bundle-examples --template-dir contrib/templates/streamlit-app
apps:
<app-name>:
<app-field-name>: <app-field-value>
Key | Type | Description |
---|---|---|
| String | The budget policy ID for the app. |
| Map | Deprecated. Define your app configuration commands and environment variables in the |
| String | The description of the app. |
| String | The name of the app. The name must contain only lowercase alphanumeric characters and hyphens. It must be unique within the workspace. |
| Sequence | The app's permissions. See permissions. |
| Sequence | The app compute resources. See apps.name.resources. |
| String | The |
| Sequence | The user API scopes. |
apps.name.resources
Type: Sequence
The compute resources for the app.
Key | Type | Description |
---|---|---|
| String | The description of the app resource. |
| Map | The settings that identify the job resource to use. See resources.job. |
| String | The name of the app resource. |
| Map | The secret settings. See resources.secret. |
| Map | The settings that identify the serving endpoint resource to use. See resources.serving_endpoint. |
| Map | The settings that identify the warehouse resource to use. See resources.sql_warehouse. |
Example
The following example creates an app named my_app
that manages a job created by the bundle:
resources:
jobs:
# Define a job in the bundle
hello_world:
name: hello_world
tasks:
- task_key: task
spark_python_task:
python_file: ../src/main.py
environment_key: default
environments:
- environment_key: default
spec:
client: '1'
# Define an app that manages the job in the bundle
apps:
job_manager:
name: 'job_manager_app'
description: 'An app which manages a job created by this bundle'
# The location of the source code for the app
source_code_path: ../src/app
# The resources in the bundle which this app has access to. This binds the resource in the app with the bundle resource.
resources:
- name: 'app-job'
job:
id: ${resources.jobs.hello_world.id}
permission: 'CAN_MANAGE_RUN'
The corresponding app.yaml
defines the configuration for running the app:
command:
- flask
- --app
- app
- run
- --debug
env:
- name: JOB_ID
valueFrom: 'app-job'
For the complete Databricks app example bundle, see the bundle-examples GitHub repository.
cluster
Type: Map
The cluster resource defines a cluster.
clusters:
<cluster-name>:
<cluster-field-name>: <cluster-field-value>
Key | Type | Description |
---|---|---|
| Boolean | When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied. |
| Map | Parameters needed in order to automatically scale clusters up and down based on load. See autoscale. |
| Integer | Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination. |
| Map | Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used. See aws_attributes. |
| Map | Attributes related to clusters running on Microsoft Azure. If not specified at cluster creation, a set of default values will be used. See azure_attributes. |
| Map | The configuration for delivering spark logs to a long-term storage destination. See cluster_log_conf. |
| String | Cluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string. |
| Map | Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to |
| String | The data governance model to use when accessing data from a cluster. See data_security_mode. |
| Map | The custom docker image. See docker_image. |
| String | The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned. |
| String | The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as |
| Boolean | Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details. |
| Boolean | Whether to enable LUKS on cluster VMs' local disks |
| Map | Attributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used. See gcp_attributes. |
| Sequence | The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. See init_scripts. |
| String | The optional ID of the instance pool to which the cluster belongs. |
| Boolean | This field can only be used when |
| String | The kind of compute described by this compute specification. |
| String | This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call. |
| Integer | Number of worker nodes that this cluster should have. A cluster has one Spark Driver and |
| Sequence | The cluster permissions. See permissions. |
| String | The ID of the cluster policy used to create the cluster if applicable. |
| String | Determines the cluster's runtime engine, either |
| String | Single user name if data_security_mode is |
| Map | An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via |
| Map | An object containing a set of optional, user-specified environment variable key-value pairs. |
| String | The Spark version of the cluster, e.g. |
| Sequence | SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name |
| Boolean | This field can only be used when |
| Map | Cluster Attributes showing for clusters workload types. See workload_type. |
Examples
The following example creates a dedicated (single-user) cluster for the current user with Databricks Runtime 15.4 LTS and a cluster policy:
resources:
clusters:
my_cluster:
num_workers: 0
node_type_id: 'i3.xlarge'
driver_node_type_id: 'i3.xlarge'
spark_version: '15.4.x-scala2.12'
spark_conf:
'spark.executor.memory': '2g'
autotermination_minutes: 60
enable_elastic_disk: true
single_user_name: ${workspace.current_user.userName}
policy_id: '000128DB309672CA'
enable_local_disk_encryption: false
data_security_mode: SINGLE_USER
runtime_engine": STANDARD
This example creates a simple cluster my_cluster
and sets that as the cluster to use to run the notebook in my_job
:
bundle:
name: clusters
resources:
clusters:
my_cluster:
num_workers: 2
node_type_id: 'i3.xlarge'
autoscale:
min_workers: 2
max_workers: 7
spark_version: '13.3.x-scala2.12'
spark_conf:
'spark.executor.memory': '2g'
jobs:
my_job:
tasks:
- task_key: test_task
notebook_task:
notebook_path: './src/my_notebook.py'
dashboard
Type: Map
The dashboard resource allows you to manage AI/BI dashboards in a bundle. For information about AI/BI dashboards, see Dashboards.
dashboards:
<dashboard-name>:
<dashboard-field-name>: <dashboard-field-value>
Key | Type | Description |
---|---|---|
| String | The display name of the dashboard. |
| String | The etag for the dashboard. Can be optionally provided on updates to ensure that the dashboard has not been modified since the last read. |
| String | The local path of the dashboard asset, including the file name. Exported dashboards always have the file extension |
| Sequence | The dashboard permissions. See permissions. |
| Any | The contents of the dashboard in serialized string form. |
| String | The warehouse ID used to run the dashboard. |
Example
The following example includes and deploys the sample NYC Taxi Trip Analysis dashboard to the Databricks workspace.
resources:
dashboards:
nyc_taxi_trip_analysis:
display_name: 'NYC Taxi Trip Analysis'
file_path: ../src/nyc_taxi_trip_analysis.lvdash.json
warehouse_id: ${var.warehouse_id}
If you use the UI to modify the dashboard, modifications made through the UI are not applied to the dashboard JSON file in the local bundle unless you explicitly update it using bundle generate
. You can use the --watch
option to continuously poll and retrieve changes to the dashboard. See Generate a bundle configuration file.
In addition, if you attempt to deploy a bundle that contains a dashboard JSON file that is different than the one in the remote workspace, an error will occur. To force the deploy and overwrite the dashboard in the remote workspace with the local one, use the --force
option. See Deploy a bundle.
experiment
Type: Map
The experiment resource allows you to define MLflow experiments in a bundle. For information about MLflow experiments, see Organize training runs with MLflow experiments.
experiments:
<experiment-name>:
<experiment-field-name>: <experiment-field-value>
Key | Type | Description |
---|---|---|
| String | The location where artifacts for the experiment are stored. |
| String | The friendly name that identifies the experiment. |
| Sequence | The experiment's permissions. See permissions. |
| Sequence | Additional metadata key-value pairs. See tags. |
Example
The following example defines an experiment that all users can view:
resources:
experiments:
experiment:
name: my_ml_experiment
permissions:
- level: CAN_READ
group_name: users
description: MLflow experiment used to track runs
job
Type: Map
The job resource allows you to define jobs and their corresponding tasks in your bundle. For information about jobs, see Orchestration using Databricks Jobs. For a tutorial that uses a Databricks Asset Bundles template to create a job, see Develop a job on Databricks using Databricks Asset Bundles.
jobs:
<job-name>:
<job-field-name>: <job-field-value>
Key | Type | Description |
---|---|---|
| String | The id of the user-specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job. See |
| Map | An optional continuous property for this job. The continuous property will ensure that there is always one run executing. Only one of |
| Map | Deployment information for jobs managed by external sources. See deployment. |
| String | An optional description for the job. The maximum length is 27700 characters in UTF-8 encoding. |
| String | Edit mode of the job, either |
| Map | An optional set of email addresses that is notified when runs of this job begin or complete as well as when this job is deleted. See email_notifications. |
| Sequence | A list of task execution environment specifications that can be referenced by serverless tasks of this job. An environment is required to be present for serverless tasks. For serverless notebook tasks, the environment is accessible in the notebook environment panel. For other serverless tasks, the task environment is required to be specified using environment_key in the task settings. See environments. |
| String | The format of the job. |
| Map | An optional specification for a remote Git repository containing the source code used by tasks. The git_source field and task source field set to GIT are not recommended for bundles, because local relative paths may not point to the same content in the Git repository, and bundles expect that a deployed job has the same content as the local copy from where it was deployed. Instead, clone the repository locally and set up your bundle project within this repository, so that the source for tasks are the workspace. |
| Map | An optional set of health rules that can be defined for this job. See health. |
| Sequence | A list of job cluster specifications that can be shared and reused by tasks of this job. See clusters. |
| Integer | An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. See max_concurrent_runs. |
| String | An optional name for the job. The maximum length is 4096 bytes in UTF-8 encoding. |
| Map | Optional notification settings that are used when sending notifications to each of the |
| Sequence | Job-level parameter definitions. See parameters. |
| String | PerformanceTarget defines how performant or cost efficient the execution of run on serverless should be. |
| Sequence | The job's permissions. See permissions. |
| Map | The queue settings of the job. See queue. |
| Map | Write-only setting. Specifies the user or service principal that the job runs as. If not specified, the job runs as the user who created the job. Either |
| Map | An optional periodic schedule for this job. The default behavior is that the job only runs when triggered by clicking “Run Now” in the Jobs UI or sending an API request to |
| Map | A map of tags associated with the job. These are forwarded to the cluster as cluster tags for jobs clusters, and are subject to the same limitations as cluster tags. A maximum of 25 tags can be added to the job. |
| Sequence | A list of task specifications to be executed by this job. See Add tasks to jobs in Databricks Asset Bundles. |
| Integer | An optional timeout applied to each run of this job. A value of |
| Map | A configuration to trigger a run when certain conditions are met. See trigger. |
| Map | A collection of system notification IDs to notify when runs of this job begin or complete. See webhook_notifications. |
Example
The following example defines a job with the resource key hello-job
with one notebook task:
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
notebook_task:
notebook_path: ./hello.py
For information about defining job tasks and overriding job settings, see Add tasks to jobs in Databricks Asset Bundles, Override job tasks settings in Databricks Asset Bundles, and Override cluster settings in Databricks Asset Bundles.
The job git_source
field and task source
field set to GIT
are not recommended for bundles, because local relative paths may not point to the same content in the Git repository, and bundles expect that a deployed job has the same content as the local copy from where it was deployed.
Instead, clone the repository locally and set up your bundle project within this repository, so that the source for tasks are the workspace.
model (legacy)
Type: Map
The model resource allows you to define legacy models in bundles. Databricks recommends you use Unity Catalog registered models instead.
model_serving_endpoint
Type: Map
The model_serving_endpoint resource allows you to define model serving endpoints. See Manage model serving endpoints.
model_serving_endpoints:
<model_serving_endpoint-name>:
<model_serving_endpoint-field-name>: <model_serving_endpoint-field-value>
Key | Type | Description |
---|---|---|
| Map | The AI Gateway configuration for the serving endpoint. NOTE: Only external model and provisioned throughput endpoints are currently supported. See ai_gateway. |
| Map | The core config of the serving endpoint. See config. |
| String | The name of the serving endpoint. This field is required and must be unique across a Databricks workspace. An endpoint name can consist of alphanumeric characters, dashes, and underscores. |
| Sequence | The model serving endpoint's permissions. See permissions. |
| Sequence | Rate limits to be applied to the serving endpoint. NOTE: this field is deprecated, please use AI Gateway to manage rate limits. See rate_limits. |
| Boolean | Enable route optimization for the serving endpoint. |
| Sequence | Tags to be attached to the serving endpoint and automatically propagated to billing logs. See tags. |
Example
The following example defines a Unity Catalog model serving endpoint:
resources:
model_serving_endpoints:
uc_model_serving_endpoint:
name: 'uc-model-endpoint'
config:
served_entities:
- entity_name: 'myCatalog.mySchema.my-ads-model'
entity_version: '10'
workload_size: 'Small'
scale_to_zero_enabled: 'true'
traffic_config:
routes:
- served_model_name: 'my-ads-model-10'
traffic_percentage: '100'
tags:
- key: 'team'
value: 'data science'
pipeline
Type: Map
The pipeline resource allows you to create DLT pipelines. For information about pipelines, see What is DLT?. For a tutorial that uses the Databricks Asset Bundles template to create a pipeline, see Develop DLT pipelines with Databricks Asset Bundles.
pipelines:
<pipeline-name>:
<pipeline-field-name>: <pipeline-field-value>
Key | Type | Description |
---|---|---|
| Boolean | If false, deployment will fail if name conflicts with that of another pipeline. |
| String | A catalog in Unity Catalog to publish data from this pipeline to. If |
| String | The DLT Release Channel that specifies which version of DLT to use. |
| Sequence | The cluster settings for this pipeline deployment. See cluster. |
| Map | The configuration for this pipeline execution. |
| Boolean | Whether the pipeline is continuous or triggered. This replaces |
| Map | Deployment type of this pipeline. See deployment. |
| Boolean | Whether the pipeline is in development mode. Defaults to false. |
| Boolean | Whether the pipeline is a dry run pipeline. |
| String | The pipeline product edition. |
| Map | The event log configuration for this pipeline. See event_log. |
| Map | The filters that determine which pipeline packages to include in the deployed graph. See filters. |
| String | Unique identifier for this pipeline. |
| Map | The configuration for a managed ingestion pipeline. These settings cannot be used with the |
| Sequence | Libraries or code needed by this deployment. See libraries. |
| String | A friendly name for this pipeline. |
| Sequence | The notification settings for this pipeline. See notifications. |
| Sequence | The pipeline's permissions. See permissions. |
| Boolean | Whether Photon is enabled for this pipeline. |
| String | The default schema (database) where tables are read from or published to. |
| Boolean | Whether serverless compute is enabled for this pipeline. |
| String | The DBFS root directory for storing checkpoints and tables. |
| String | Target schema (database) to add tables in this pipeline to. Exactly one of |
| Map | Deprecated. Which pipeline trigger to use. Use |
Example
The following example defines a pipeline with the resource key hello-pipeline
:
resources:
pipelines:
hello-pipeline:
name: hello-pipeline
clusters:
- label: default
num_workers: 1
development: true
continuous: false
channel: CURRENT
edition: CORE
photon: false
libraries:
- notebook:
path: ./pipeline.py
quality_monitor (Unity Catalog)
Type: Map
The quality_monitor resource allows you to define a Unity Catalog table monitor. For information about monitors, see Introduction to Databricks Lakehouse Monitoring.
quality_monitors:
<quality_monitor-name>:
<quality_monitor-field-name>: <quality_monitor-field-value>
Key | Type | Description |
---|---|---|
| String | The directory to store monitoring assets (e.g. dashboard, metric tables). |
| String | Name of the baseline table from which drift metrics are computed from. Columns in the monitored table should also be present in the baseline table. |
| Sequence | Custom metrics to compute on the monitored table. These can be aggregate metrics, derived metrics (from already computed aggregate metrics), or drift metrics (comparing metrics across time windows). See custom_metrics. |
| Map | Configuration for monitoring inference logs. See inference_log. |
| Map | The notification settings for the monitor. See notifications. |
| String | Schema where output metric tables are created. |
| Map | The schedule for automatically updating and refreshing metric tables. See schedule. |
| Boolean | Whether to skip creating a default dashboard summarizing data quality metrics. |
| Sequence | List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For high-cardinality columns, only the top 100 unique values by frequency will generate slices. |
| Map | Configuration for monitoring snapshot tables. |
| String | The full name of the table. |
| Map | Configuration for monitoring time series tables. See time_series. |
| String | Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used. |
Examples
For a complete example bundle that defines a quality_monitor
, see the mlops_demo bundle.
The following examples define quality monitors for InferenceLog, TimeSeries, and Snapshot profile types.
# InferenceLog profile type
resources:
quality_monitors:
my_quality_monitor:
table_name: dev.mlops_schema.predictions
output_schema_name: ${bundle.target}.mlops_schema
assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
inference_log:
granularities: [1 day]
model_id_col: model_id
prediction_col: prediction
label_col: price
problem_type: PROBLEM_TYPE_REGRESSION
timestamp_col: timestamp
schedule:
quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
timezone_id: UTC
# TimeSeries profile type
resources:
quality_monitors:
my_quality_monitor:
table_name: dev.mlops_schema.predictions
output_schema_name: ${bundle.target}.mlops_schema
assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
time_series:
granularities: [30 minutes]
timestamp_col: timestamp
schedule:
quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
timezone_id: UTC
# Snapshot profile type
resources:
quality_monitors:
my_quality_monitor:
table_name: dev.mlops_schema.predictions
output_schema_name: ${bundle.target}.mlops_schema
assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
snapshot: {}
schedule:
quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
timezone_id: UTC
registered_model (Unity Catalog)
Type: Map
The registered model resource allows you to define models in Unity Catalog. For information about Unity Catalog registered models, see Manage model lifecycle in Unity Catalog.
registered_models:
<registered_model-name>:
<registered_model-field-name>: <registered_model-field-value>
Key | Type | Description |
---|---|---|
| String | The name of the catalog where the schema and the registered model reside. |
| String | The comment attached to the registered model. |
| Sequence | The grants associated with the registered model. See grants. |
| String | The name of the registered model. |
| String | The name of the schema where the registered model resides. |
| String | The storage location on the cloud under which model version data files are stored. |
Example
The following example defines a registered model in Unity Catalog:
resources:
registered_models:
model:
name: my_model
catalog_name: ${bundle.target}
schema_name: mlops_schema
comment: Registered model in Unity Catalog for ${bundle.target} deployment target
grants:
- privileges:
- EXECUTE
principal: account users
schema (Unity Catalog)
Type: Map
The schema resource type allows you to define Unity Catalog schemas for tables and other assets in your workflows and pipelines created as part of a bundle. A schema, different from other resource types, has the following limitations:
- The owner of a schema resource is always the deployment user, and cannot be changed. If
run_as
is specified in the bundle, it will be ignored by operations on the schema. - Only fields supported by the corresponding Schemas object create API are available for the schema resource. For example,
enable_predictive_optimization
is not supported as it is only available on the update API.
schemas:
<schema-name>:
<schema-field-name>: <schema-field-value>
Key | Type | Description |
---|---|---|
| String | The name of the parent catalog. |
| String | A user-provided free-form text description. |
| Sequence | The grants associated with the schema. See grants. |
| String | The name of schema, relative to the parent catalog. |
| Map | A map of key-value properties attached to the schema. |
| String | The storage root URL for managed tables within the schema. |
Examples
The following example defines a pipeline with the resource key my_pipeline
that creates a Unity Catalog schema with the key my_schema
as the target:
resources:
pipelines:
my_pipeline:
name: test-pipeline-{{.unique_id}}
libraries:
- notebook:
path: ./nb.sql
development: true
catalog: main
target: ${resources.schemas.my_schema.id}
schemas:
my_schema:
name: test-schema-{{.unique_id}}
catalog_name: main
comment: This schema was created by DABs.
A top-level grants mapping is not supported by Databricks Asset Bundles, so if you want to set grants for a schema, define the grants for the schema within the schemas
mapping. For more information about grants, see Show, grant, and revoke privileges.
The following example defines a Unity Catalog schema with grants:
resources:
schemas:
my_schema:
name: test-schema
grants:
- principal: users
privileges:
- SELECT
- principal: my_team
privileges:
- CAN_MANAGE
catalog_name: main
volume (Unity Catalog)
Type: Map
The volume resource type allows you to define and create Unity Catalog volumes as part of a bundle. When deploying a bundle with a volume defined, note that:
- A volume cannot be referenced in the
artifact_path
for the bundle until it exists in the workspace. Hence, if you want to use Databricks Asset Bundles to create the volume, you must first define the volume in the bundle, deploy it to create the volume, then reference it in theartifact_path
in subsequent deployments. - Volumes in the bundle are not prepended with the
dev_${workspace.current_user.short_name}
prefix when the deployment target hasmode: development
configured. However, you can manually configure this prefix. See Custom presets.
volumes:
<volume-name>:
<volume-field-name>: <volume-field-value>
Key | Type | Description |
---|---|---|
| String | The name of the catalog of the schema and volume. |
| String | The comment attached to the volume. |
| Sequence | The grants associated with the volume. See grants. |
| String | The name of the volume. |
| String | The name of the schema where the volume is. |
| String | The storage location on the cloud. |
| String | The volume type, either |
Example
The following example creates a Unity Catalog volume with the key my_volume
:
resources:
volumes:
my_volume:
catalog_name: main
name: my_volume
schema_name: my_schema
For an example bundle that runs a job that writes to a file in Unity Catalog volume, see the bundle-examples GitHub repository.
Common objects
grants
Type: Sequence
Key | Type | Description |
---|---|---|
| String | The name of the principal that will be granted privileges. |
| Sequence | The privileges to grant to the specified entity. |