Databricks Asset Bundles resources
Databricks Asset Bundles allows you to specify information about the Databricks resources used by the bundle in the resources mapping in the bundle configuration. See resources mapping and resources key reference.
This page provides configuration reference for all supported resource types for bundles and provides details and an example for each supported type. For additional examples, see Bundle configuration examples.
The JSON schema for bundles that is used to validate YAML configuration is in the Databricks CLI GitHub repository.
To generate YAML for any existing resource, use the databricks bundle generate command. See databricks bundle generate.
Supported resources
The following table lists supported resource types for bundles (YAML and Python, where applicable). Some resources can be created by defining them in a bundle and deploying the bundle, and some resources can only be created by referencing an existing asset to include in the bundle.
Resource configuration defines a Databricks object that corresponds to a Databricks REST API object. The REST API object's supported create request fields, expressed as YAML, are the resource's supported keys. Links to documentation for each resource's corresponding object are in the table below.
The databricks bundle validate command returns warnings if unknown resource properties are found in bundle configuration files.
Resource | Corresponding REST API object | |
|---|---|---|
registered_model (Unity Catalog) | ||
schema (Unity Catalog) | ||
volume (Unity Catalog) |
app
Type: Map
The app resource defines a Databricks app. For information about Databricks Apps, see Databricks Apps.
To add an app, specify the settings to define the app, including the required source_code_path.
You can initialize a bundle with a Streamlit Databricks app using the following command:
databricks bundle init https://github.com/databricks/bundle-examples --template-dir contrib/templates/streamlit-app
apps:
<app-name>:
<app-field-name>: <app-field-value>
Key | Type | Description |
|---|---|---|
| String | The budget policy ID for the app. |
| String | The compute size for the app. Valid values are |
| Map | Deprecated. Define your app configuration commands and environment variables in the |
| String | The description of the app. |
| Map | The behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | The name of the app. The name must contain only lowercase alphanumeric characters and hyphens. It must be unique within the workspace. |
| Sequence | The app's permissions. See permissions. |
| Sequence | The app compute resources. See app.resources. |
| String | The |
| Sequence | The user API scopes. |
app.resources
Type: Sequence
The compute resources for the app.
Key | Type | Description |
|---|---|---|
| String | The description of the app resource. |
| Map | The settings that identify the Lakebase database to use. See app.resources.database. |
| Map | The settings that identify the Genie space to use. See app.resources.genie_space. |
| Map | The settings that identify the job resource to use. See app.resources.job. |
| String | The name of the app resource. |
| Map | The settings that identify the Databricks secret resource to use. See app.resources.secret. |
| Map | The settings that identify the model serving endpoint resource to use. See app.resources.serving_endpoint. |
| Map | The settings that identify the SQL warehouse resource to use. See app.resources.sql_warehouse. |
| Map | The settings that identify the Unity Catalog volume to use. See app.resources.uc_securable. |
app.resources.database
Type: Map
The settings that identify the Lakebase database to use.
Key | Type | Description |
|---|---|---|
| String | The ID of the database instance. |
| String | The permission level for the database. Valid values include |
app.resources.genie_space
Type: Map
The settings that identify the Genie space to use.
Key | Type | Description |
|---|---|---|
| String | The name of the Genie space. |
| String | The permission level for the space. Valid values include |
| String | The ID of the Genie space, for example |
app.resources.job
Type: Map
The settings that identify the job resource to use.
Key | Type | Description |
|---|---|---|
| String | The ID of the job. |
| String | The permission level for the job. Valid values include |
app.resources.secret
Type: Map
The settings that identify the Databricks secret resource to use.
Key | Type | Description |
|---|---|---|
| String | The name of the secret scope. |
| String | The key within the secret scope. |
| String | The permission level for the secret. Valid values include |
app.resources.serving_endpoint
Type: Map
The settings that identify the model serving endpoint resource to use.
Key | Type | Description |
|---|---|---|
| String | The name of the serving endpoint. |
| String | The permission level for the serving endpoint. Valid values include |
app.resources.sql_warehouse
Type: Map
The settings that identify the SQL warehouse to use.
Key | Type | Description |
|---|---|---|
| String | The ID of the SQL warehouse. |
| String | The permission level for the SQL warehouse. Valid values include |
app.resources.uc_securable
Type: Map
The settings that identify the Unity Catalog volume to use.
Key | Type | Description |
|---|---|---|
| String | The full name of the Unity Catalog securable in the format |
| String | The permission level for the UC securable. Valid values include |
Example
The following example creates an app named my_app that manages a job created by the bundle:
resources:
jobs:
# Define a job in the bundle
hello_world:
name: hello_world
tasks:
- task_key: task
spark_python_task:
python_file: ../src/main.py
environment_key: default
environments:
- environment_key: default
spec:
environment_version: '2'
# Define an app that manages the job in the bundle
apps:
job_manager:
name: 'job_manager_app'
description: 'An app which manages a job created by this bundle'
# The location of the source code for the app
source_code_path: ../src/app
# The resources in the bundle which this app has access to. This binds the resource in the app with the bundle resource.
resources:
- name: 'app-job'
job:
id: ${resources.jobs.hello_world.id}
permission: 'CAN_MANAGE_RUN'
The corresponding app.yaml defines the configuration for running the app:
command:
- flask
- --app
- app
- run
- --debug
env:
- name: JOB_ID
valueFrom: 'app-job'
For the complete Databricks app example bundle, see the bundle-examples GitHub repository.
cluster
Type: Map
The cluster resource defines a cluster.
clusters:
<cluster-name>:
<cluster-field-name>: <cluster-field-value>
Key | Type | Description |
|---|---|---|
| Boolean | When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied. |
| Map | Parameters needed in order to automatically scale clusters up and down based on load. See autoscale. |
| Integer | Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination. |
| Map | Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used. See aws_attributes. |
| Map | Attributes related to clusters running on Microsoft Azure. If not specified at cluster creation, a set of default values will be used. See azure_attributes. |
| Map | The configuration for delivering spark logs to a long-term storage destination. See cluster_log_conf. |
| String | Cluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string. |
| Map | Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to |
| String | The data governance model to use when accessing data from a cluster. Valid values include |
| Map | The custom docker image. See docker_image. |
| String | The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned. |
| String | The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as |
| Boolean | Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details. |
| Boolean | Whether to enable LUKS on cluster VMs' local disks |
| Map | Attributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used. See gcp_attributes. |
| Sequence | The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. See init_scripts. |
| String | The optional ID of the instance pool to which the cluster belongs. |
| Boolean | This field can only be used when |
| String | The kind of compute described by this compute specification. |
| String | This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call. |
| Integer | Number of worker nodes that this cluster should have. A cluster has one Spark Driver and |
| Sequence | The cluster permissions. See permissions. |
| String | The ID of the cluster policy used to create the cluster if applicable. |
| String | Determines the cluster's runtime engine, either |
| String | Single user name if data_security_mode is |
| Map | An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via |
| Map | An object containing a set of optional, user-specified environment variable key-value pairs. |
| String | The Spark version of the cluster, e.g. |
| Sequence | SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name |
| Boolean | This field can only be used when |
| Map | Cluster Attributes showing for clusters workload types. See workload_type. |
cluster.autoscale
Type: Map
Parameters for automatically scaling clusters up and down based on load.
Key | Type | Description |
|---|---|---|
| Integer | The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation. |
| Integer | The maximum number of workers to which the cluster can scale up when overloaded. |
cluster.aws_attributes
Type: Map
Attributes related to clusters running on Amazon Web Services.
Key | Type | Description |
|---|---|---|
| String | Identifier for the availability zone/datacenter in which the cluster resides. This string will be of a form like |
| String | Availability type used for all subsequent nodes past the |
| Integer | The max price for AWS spot instances, as a percentage of the corresponding instance type's on-demand price. |
| String | Nodes for this cluster will only be placed on AWS instances with this instance profile. |
| Integer | The first |
| String | The type of EBS volumes that will be launched with this cluster. Valid values are |
| Integer | The number of volumes launched for each instance. |
| Integer | The size of each EBS volume (in GiB) launched for each instance. |
| Integer | The number of IOPS per EBS gp3 volume. |
| Integer | The throughput per EBS gp3 volume, in MiB per second. |
cluster.azure_attributes
Type: Map
Attributes related to clusters running on Microsoft Azure.
Key | Type | Description |
|---|---|---|
| Integer | The first |
| String | Availability type used for all subsequent nodes past the |
| Number | The max price for Azure spot instances. Use |
cluster.gcp_attributes
Type: Map
Attributes related to clusters running on Google Cloud Platform.
Key | Type | Description |
|---|---|---|
| Boolean | Whether to use preemptible executors. Preemptible executors are preemptible GCE instances that may be reclaimed by GCE at any time. |
| String | The Google service account to be used by the Databricks cluster VM instances. |
| Integer | The number of local SSDs to attach to each node in the cluster. The default value is |
| String | Identifier for the availability zone/datacenter in which the cluster resides. |
| String | Availability type used for all nodes. Valid values are |
| Integer | The size of the boot disk in GB. Values typically range from 100 to 1000. |
cluster.cluster_log_conf
The configuration for delivering Spark logs to a long-term storage destination.
Key | Type | Description |
|---|---|---|
| Map | DBFS location for cluster log delivery. See dbfs. |
| Map | S3 location for cluster log delivery. See s3. |
| Map | Volumes location for cluster log delivery. See volumes. |
cluster.cluster_log_conf.dbfs
Type: Map
DBFS location for cluster log delivery.
Key | Type | Description |
|---|---|---|
| String | The DBFS path for cluster log delivery (for example, |
cluster.cluster_log_conf.s3
Type: Map
S3 location for cluster log delivery.
Key | Type | Description |
|---|---|---|
| String | The S3 URI for cluster log delivery (for example, |
| String | The AWS region of the S3 bucket. |
| String | The S3 endpoint URL (optional). |
| Boolean | Whether to enable encryption for cluster logs. |
| String | The encryption type. Valid values include |
| String | The KMS key ARN for encryption (when using |
| String | The canned ACL to apply to cluster logs. |
cluster.cluster_log_conf.volumes
Type: Map
Volumes location for cluster log delivery.
Key | Type | Description |
|---|---|---|
| String | The volume path for cluster log delivery (for example, |
cluster.docker_image
Type: Map
The custom Docker image configuration.
Key | Type | Description |
|---|---|---|
| String | URL of the Docker image. |
| Map | Basic authentication for Docker repository. See basic_auth. |
cluster.docker_image.basic_auth
Type: Map
Basic authentication for Docker repository.
Key | Type | Description |
|---|---|---|
| String | The username for Docker registry authentication. |
| String | The password for Docker registry authentication. |
cluster.init_scripts
Type: Map
The configuration for storing init scripts. At least one location type must be specified.
Key | Type | Description |
|---|---|---|
| Map | DBFS location of init script. See dbfs. |
| Map | Workspace location of init script. See workspace. |
| Map | S3 location of init script. See s3. |
| Map | ABFSS location of init script. See abfss. |
| Map | GCS location of init script. See gcs. |
| Map | UC Volumes location of init script. See volumes. |
cluster.init_scripts.dbfs
Type: Map
DBFS location of init script.
Key | Type | Description |
|---|---|---|
| String | The DBFS path of the init script. |
cluster.init_scripts.workspace
Type: Map
Workspace location of init script.
Key | Type | Description |
|---|---|---|
| String | The workspace path of the init script. |
cluster.init_scripts.s3
Type: Map
S3 location of init script.
Key | Type | Description |
|---|---|---|
| String | The S3 URI of the init script. |
| String | The AWS region of the S3 bucket. |
| String | The S3 endpoint URL (optional). |
cluster.init_scripts.abfss
Type: Map
ABFSS location of init script.
Key | Type | Description |
|---|---|---|
| String | The ABFSS path of the init script. |
cluster.init_scripts.gcs
Type: Map
GCS location of init script.
Key | Type | Description |
|---|---|---|
| String | The GCS path of the init script. |
cluster.init_scripts.volumes
Type: Map
Volumes location of init script.
Key | Type | Description |
|---|---|---|
| String | The UC Volumes path of the init script. |
cluster.workload_type
Type: Map
Cluster attributes showing cluster workload types.
Key | Type | Description |
|---|---|---|
| Map | Defines what type of clients can use the cluster. See clients. |
cluster.workload_type.clients
Type: Map
The type of clients for this compute workload.
Key | Type | Description |
|---|---|---|
| Boolean | Whether the cluster can run jobs. |
| Boolean | Whether the cluster can run notebooks. |
Examples
The following example creates a dedicated (single-user) cluster for the current user with Databricks Runtime 15.4 LTS and a cluster policy:
resources:
clusters:
my_cluster:
num_workers: 0
node_type_id: 'i3.xlarge'
driver_node_type_id: 'i3.xlarge'
spark_version: '15.4.x-scala2.12'
spark_conf:
'spark.executor.memory': '2g'
autotermination_minutes: 60
enable_elastic_disk: true
single_user_name: ${workspace.current_user.userName}
policy_id: '000128DB309672CA'
enable_local_disk_encryption: false
data_security_mode: SINGLE_USER
runtime_engine": STANDARD
This example creates a simple cluster my_cluster and sets that as the cluster to use to run the notebook in my_job:
bundle:
name: clusters
resources:
clusters:
my_cluster:
num_workers: 2
node_type_id: 'i3.xlarge'
autoscale:
min_workers: 2
max_workers: 7
spark_version: '13.3.x-scala2.12'
spark_conf:
'spark.executor.memory': '2g'
jobs:
my_job:
tasks:
- task_key: test_task
notebook_task:
notebook_path: './src/my_notebook.py'
existing_cluster_id: ${resources.clusters.my_cluster.id}
dashboard
Type: Map
The dashboard resource allows you to manage AI/BI dashboards in a bundle. For information about AI/BI dashboards, see Dashboards.
If you deployed a bundle that contains a dashboard from your local environment and then use the UI to modify that dashboard, modifications made through the UI are not applied to the dashboard JSON file in the local bundle unless you explicitly update it using bundle generate. You can use the --watch option to continuously poll and retrieve changes to the dashboard. See databricks bundle generate.
In addition, if you attempt to deploy a bundle from your local environment that contains a dashboard JSON file that is different than the one in the remote workspace, an error will occur. To force the deploy and overwrite the dashboard in the remote workspace with the local one, use the --force option. See databricks bundle deploy.
When using Databricks Asset Bundles with dashboard Git support, prevent duplicate dashboards from being generated by adding the sync mapping to exclude the dashboards from synchronizing as files:
sync:
exclude:
- src/*.lvdash.json
dashboards:
<dashboard-name>:
<dashboard-field-name>: <dashboard-field-value>
Key | Type | Description |
|---|---|---|
| String | The display name of the dashboard. |
| Boolean | Whether the bundle deployment identity credentials are used to execute queries for all dashboard viewers. If it is set to |
| String | The etag for the dashboard. Can be optionally provided on updates to ensure that the dashboard has not been modified since the last read. |
| String | The local path of the dashboard asset, including the file name. Exported dashboards always have the file extension |
| Sequence | The dashboard permissions. See permissions. |
| Any | The contents of the dashboard in serialized string form. |
| String | The warehouse ID used to run the dashboard. |
Example
The following example includes and deploys the sample NYC Taxi Trip Analysis dashboard to the Databricks workspace.
resources:
dashboards:
nyc_taxi_trip_analysis:
display_name: 'NYC Taxi Trip Analysis'
file_path: ../src/nyc_taxi_trip_analysis.lvdash.json
warehouse_id: ${var.warehouse_id}
database_catalog
Type: Map
The database catalog resource allows you to define database catalogs that correspond to database instances in a bundle. A database catalog is a Lakebase database that is registered as a Unity Catalog catalog.
For information about database catalogs, see Create a catalog.
database_catalogs:
<database_catalog-name>:
<database_catalog-field-name>: <database_catalog-field-value>
Key | Type | Description |
|---|---|---|
| Boolean | Whether to create the database if it does not exist. |
| String | The name of the instance housing the database. |
| String | The name of the database (in a instance) associated with the catalog. |
| Map | Contains the lifecycle settings for a resource, including the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | The name of the catalog in Unity Catalog. |
Example
The following example defines a database instance with a corresponding database catalog:
resources:
database_instances:
my_instance:
name: my-instance
capacity: CU_1
database_catalogs:
my_catalog:
database_instance_name: ${resources.database_instances.my_instance.name}
name: example_catalog
database_name: my_database
create_database_if_not_exists: true
database_instance
Type: Map
The database instance resource allows you to define database instances in a bundle. A Lakebase database instance manages storage and compute resources and provides the endpoints that users connect to.
When you deploy a bundle with a database instance, the instance immediately starts running and is subject to pricing. See Lakebase pricing.
For information about database instances, see What is a database instance?.
database_instances:
<database_instance-name>:
<database_instance-field-name>: <database_instance-field-value>
Key | Type | Description |
|---|---|---|
| String | The sku of the instance. Valid values are |
| Sequence | A list of key-value pairs that specify custom tags associated with the instance. |
| Boolean | Whether the instance has PG native password login enabled. Defaults to |
| Boolean | Whether to enable secondaries to serve read-only traffic. Defaults to |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | The name of the instance. This is the unique identifier for the instance. |
| Integer | The number of nodes in the instance, composed of 1 primary and 0 or more secondaries. Defaults to 1 primary and 0 secondaries. |
| Map | The reference of the parent instance. This is only available if the instance is child instance. See parent instance. |
| Sequence | The database instance's permissions. See permissions. |
| Integer | The retention window for the instance. This is the time window in days for which the historical data is retained. The default value is 7 days. Valid values are 2 to 35 days. |
| Boolean | Whether the instance is stopped. |
| String | The desired usage policy to associate with the instance. |
database_instance.parent_instance_ref
Type: Map
The reference of the parent instance. This is only available if the instance is child instance.
Key | Type | Description |
|---|---|---|
| String | Branch time of the ref database instance. For a parent ref instance, this is the point in time on the parent instance from which the instance was created. For a child ref instance, this is the point in time on the instance from which the child instance was created. |
| String | User-specified WAL LSN of the ref database instance. |
| String | Name of the ref database instance. |
Example
The following example defines a database instance with a corresponding database catalog:
resources:
database_instances:
my_instance:
name: my-instance
capacity: CU_1
database_catalogs:
my_catalog:
database_instance_name: ${resources.database_instances.my_instance.name}
name: example_catalog
database_name: my_database
create_database_if_not_exists: true
For an example bundle that demonstrates how to define a database instance and corresponding database catalog, see the bundle-examples GitHub repository.
experiment
Type: Map
The experiment resource allows you to define MLflow experiments in a bundle. For information about MLflow experiments, see Organize training runs with MLflow experiments.
experiments:
<experiment-name>:
<experiment-field-name>: <experiment-field-value>
Key | Type | Description |
|---|---|---|
| String | The location where artifacts for the experiment are stored. |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | The friendly name that identifies the experiment. An experiment name must be an absolute path in the Databricks workspace, for example |
| Sequence | The experiment's permissions. See permissions. |
| Sequence | Additional metadata key-value pairs. See tags. |
Example
The following example defines an experiment that all users can view:
resources:
experiments:
experiment:
name: /Workspace/Users/someone@example.com/my_experiment
permissions:
- level: CAN_READ
group_name: users
description: MLflow experiment used to track runs
job
Type: Map
Jobs are supported in Python for Databricks Asset Bundles. See databricks.bundles.jobs.
The job resource allows you to define jobs and their corresponding tasks in your bundle.
For information about jobs, see Lakeflow Jobs. For a tutorial that uses a Databricks Asset Bundles template to create a job, see Develop a job with Databricks Asset Bundles.
jobs:
<job-name>:
<job-field-name>: <job-field-value>
Key | Type | Description |
|---|---|---|
| String | The id of the user-specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job. See |
| Map | An optional continuous property for this job. The continuous property will ensure that there is always one run executing. Only one of |
| Map | Deployment information for jobs managed by external sources. See deployment. |
| String | An optional description for the job. The maximum length is 27700 characters in UTF-8 encoding. |
| String | Edit mode of the job, either |
| Map | An optional set of email addresses that is notified when runs of this job begin or complete as well as when this job is deleted. See email_notifications. |
| Sequence | A list of task execution environment specifications that can be referenced by serverless tasks of this job. An environment is required to be present for serverless tasks. For serverless notebook tasks, the environment is accessible in the notebook environment panel. For other serverless tasks, the task environment is required to be specified using environment_key in the task settings. |
| String | The format of the job. |
| Map | An optional specification for a remote Git repository containing the source code used by tasks. Important: The Instead, clone the repository locally and set up your bundle project within this repository, so that the source for tasks are the workspace. |
| Map | An optional set of health rules that can be defined for this job. See health. |
| Sequence | A list of job cluster specifications that can be shared and reused by tasks of this job. See clusters. |
| Integer | An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. |
| String | An optional name for the job. The maximum length is 4096 bytes in UTF-8 encoding. |
| Map | Optional notification settings that are used when sending notifications to each of the |
| Sequence | Job-level parameter definitions. |
| String | Defines how performant or cost efficient the execution of the run on serverless should be. |
| Sequence | The job's permissions. See permissions. |
| Map | The queue settings of the job. See queue. |
| Map | Write-only setting. Specifies the user or service principal that the job runs as. If not specified, the job runs as the user who created the job. Either |
| Map | An optional periodic schedule for this job. The default behavior is that the job only runs when triggered by clicking “Run Now” in the Jobs UI or sending an API request to |
| Map | A map of tags associated with the job. These are forwarded to the cluster as cluster tags for jobs clusters, and are subject to the same limitations as cluster tags. A maximum of 25 tags can be added to the job. |
| Sequence | A list of task specifications to be executed by this job. See Add tasks to jobs in Databricks Asset Bundles. |
| Integer | An optional timeout applied to each run of this job. A value of |
| Map | A configuration to trigger a run when certain conditions are met. See trigger. |
| Map | A collection of system notification IDs to notify when runs of this job begin or complete. See webhook_notifications. |
job.continuous
Type: Map
Configuration for continuous job execution.
Key | Type | Description |
|---|---|---|
| String | Whether the continuous job is paused or not. Valid values: |
job.deployment
Type: Map
Deployment information for jobs managed by external sources.
Key | Type | Description |
|---|---|---|
| String | The kind of deployment. For example, |
| String | The path to the metadata file for the deployment. |
job.email_notifications
Type: Map
Email notification settings for job runs.
Key | Type | Description |
|---|---|---|
| Sequence | A list of email addresses to notify when a run starts. |
| Sequence | A list of email addresses to notify when a run succeeds. |
| Sequence | A list of email addresses to notify when a run fails. |
| Sequence | A list of email addresses to notify when a run duration exceeds the warning threshold. |
| Boolean | Whether to skip sending alerts for skipped runs. |
job.git_source
Type: Map
Git repository configuration for job source code.
Key | Type | Description |
|---|---|---|
| String | The URL of the Git repository. |
| String | The Git provider. Valid values: |
| String | The name of the Git branch to use. |
| String | The name of the Git tag to use. |
| String | The Git commit hash to use. |
| Map | Used commit information. This is a read-only field. See git_snapshot. |
job.git_source.git_snapshot
Type: Map
Read-only commit information snapshot.
Key | Type | Description |
|---|---|---|
| String | The commit hash that was used. |
job.health
Type: Map
Health monitoring configuration for the job.
Key | Type | Description |
|---|---|---|
| Sequence | A list of job health rules. Each rule contains a |
JobsHealthRule
Type: Map
Key | Type | Description |
|---|---|---|
| String | Specifies the health metric that is being evaluated for a particular health rule.
|
| String | Specifies the operator used to compare the health metric value with the specified threshold. |
| Integer | Specifies the threshold value that the health metric should obey to satisfy the health rule. |
job.notification_settings
Type: Map
Notification settings that apply to all notifications for the job.
Key | Type | Description |
|---|---|---|
| Boolean | Whether to skip sending alerts for skipped runs. |
| Boolean | Whether to skip sending alerts for canceled runs. |
job.queue
Type: Map
Queue settings for the job.
Key | Type | Description |
|---|---|---|
| Boolean | Whether to enable queueing for the job. |
job.schedule
Type: Map
Schedule configuration for periodic job execution.
Key | Type | Description |
|---|---|---|
| String | A Cron expression using Quartz syntax that specifies when the job runs. For example, |
| String | The timezone for the schedule. For example, |
| String | Whether the schedule is paused or not. Valid values: |
job.trigger
Type: Map
Trigger configuration for event-driven job execution.
Key | Type | Description |
|---|---|---|
| Map | Trigger based on file arrival. See file_arrival. |
| Map | Trigger based on a table. See table. |
| Map | Trigger based on table updates. See table_update. |
| Map | Periodic trigger. See periodic. |
job.trigger.file_arrival
Type: Map
Trigger configuration based on file arrival.
Key | Type | Description |
|---|---|---|
| String | The file path to monitor for new files. |
| Integer | Minimum time in seconds between trigger events. |
| Integer | Wait time in seconds after the last file change before triggering. |
job.trigger.table
Type: Map
Trigger configuration based on a table.
Key | Type | Description |
|---|---|---|
| Sequence | A list of table names to monitor. |
| String | The SQL condition that must be met to trigger the job. |
job.trigger.table_update
Type: Map
Trigger configuration based on table updates.
Key | Type | Description |
|---|---|---|
| Sequence | A list of table names to monitor for updates. |
| String | The SQL condition that must be met to trigger the job. |
| Integer | Wait time in seconds after the last table update before triggering. |
job.trigger.periodic
Type: Map
Periodic trigger configuration.
Key | Type | Description |
|---|---|---|
| Integer | The interval value for the periodic trigger. |
| String | The unit of time for the interval. Valid values: |
job.webhook_notifications
Type: Map
Webhook notification settings for job runs.
Key | Type | Description |
|---|---|---|
| Sequence | A list of webhook notification IDs to notify when a run starts. |
| Sequence | A list of webhook notification IDs to notify when a run succeeds. |
| Sequence | A list of webhook notification IDs to notify when a run fails. |
| Sequence | A list of webhook notification IDs to notify when a run duration exceeds the warning threshold. |
Examples
The following example defines a job with the resource key hello-job with one notebook task:
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
notebook_task:
notebook_path: ./hello.py
The following example defines a job with a SQL notebook:
resources:
jobs:
job_with_sql_notebook:
name: 'Job to demonstrate using a SQL notebook with a SQL warehouse'
tasks:
- task_key: notebook
notebook_task:
notebook_path: ./select.sql
warehouse_id: 799f096837fzzzz4
For additional job configuration examples, see Job configuration.
For information about defining job tasks and overriding job settings, see:
model (legacy)
Type: Map
The model resource allows you to define legacy models in bundles. Databricks recommends you use Unity Catalog registered models instead.
model_serving_endpoint
Type: Map
The model_serving_endpoint resource allows you to define model serving endpoints. See Manage model serving endpoints.
model_serving_endpoints:
<model_serving_endpoint-name>:
<model_serving_endpoint-field-name>: <model_serving_endpoint-field-value>
Key | Type | Description |
|---|---|---|
| Map | The AI Gateway configuration for the serving endpoint. NOTE: Only external model and provisioned throughput endpoints are currently supported. See ai_gateway. |
| Map | The core config of the serving endpoint. See config. |
| String | The name of the serving endpoint. This field is required and must be unique across a Databricks workspace. An endpoint name can consist of alphanumeric characters, dashes, and underscores. |
| Sequence | The model serving endpoint's permissions. See permissions. |
| Sequence | Deprecated. Rate limits to be applied to the serving endpoint. Use AI Gateway to manage rate limits. |
| Boolean | Enable route optimization for the serving endpoint. |
| Sequence | Tags to be attached to the serving endpoint and automatically propagated to billing logs. |
model_serving_endpoint.ai_gateway
Type: Map
AI Gateway configuration for the serving endpoint.
Key | Type | Description |
|---|---|---|
| Map | Guardrail configuration. See guardrails. |
| Map | Configuration for inference logging to Unity Catalog tables. See inference_table_config. |
| Sequence | Rate limit configurations. |
| Map | Configuration for tracking usage. See usage_tracking_config. |
model_serving_endpoint.ai_gateway.guardrails
Type: Map
The AI gateway guardrails configuration.
Key | Type | Description |
|---|---|---|
| Map | Input guardrails configuration with fields like |
| Map | Output guardrails configuration with fields like |
| Sequence | A list of keywords to block. |
model_serving_endpoint.ai_gateway.inference_table_config
Type: Map
Configuration for inference logging to Unity Catalog tables.
Key | Type | Description |
|---|---|---|
| String | The name of the catalog in Unity Catalog. |
| String | The name of the schema in Unity Catalog. |
| String | The prefix for inference table names. |
| Boolean | Whether inference table logging is enabled. |
model_serving_endpoint.ai_gateway.usage_tracking_config
Type: Map
The AI gateway configuration for tracking usage.
Key | Type | Description |
|---|---|---|
| Boolean | Whether usage tracking is enabled. |
model_serving_endpoint.config
Type: Map
The core configuration of the serving endpoint.
Key | Type | Description |
|---|---|---|
| Sequence | A list of served entities for the endpoint to serve. Each served entity contains fields like |
| Sequence | (Deprecated: use |
| Map | The traffic config defining how invocations to the serving endpoint should be routed. See traffic_config. |
| Map | Configuration for Inference Tables which automatically logs requests and responses to Unity Catalog. See auto_capture_config. |
model_serving_endpoint.config.traffic_config
Type: Map
The traffic config defining how invocations to the serving endpoint should be routed.
Key | Type | Description |
|---|---|---|
| Sequence | A list of routes for traffic distribution. Each route contains |
model_serving_endpoint.config.auto_capture_config
Type: Map
Configuration for Inference Tables which automatically logs requests and responses to Unity Catalog.
Key | Type | Description |
|---|---|---|
| String | The name of the catalog in Unity Catalog. |
| String | The name of the schema in Unity Catalog. |
| String | The prefix for inference table names. |
| Boolean | Whether inference table logging is enabled. |
Example
The following example defines a Unity Catalog model serving endpoint:
resources:
model_serving_endpoints:
uc_model_serving_endpoint:
name: 'uc-model-endpoint'
config:
served_entities:
- entity_name: 'myCatalog.mySchema.my-ads-model'
entity_version: '10'
workload_size: 'Small'
scale_to_zero_enabled: 'true'
traffic_config:
routes:
- served_model_name: 'my-ads-model-10'
traffic_percentage: '100'
tags:
- key: 'team'
value: 'data science'
pipeline
Type: Map
Pipelines are supported in Python for Databricks Asset Bundles. See databricks.bundles.pipelines.
The pipeline resource allows you to create Lakeflow Spark Declarative Pipelines pipelines. For information about pipelines, see Lakeflow Spark Declarative Pipelines. For a tutorial that uses the Databricks Asset Bundles template to create a pipeline, see Develop Lakeflow Spark Declarative Pipelines with Databricks Asset Bundles.
pipelines:
<pipeline-name>:
<pipeline-field-name>: <pipeline-field-value>
Key | Type | Description |
|---|---|---|
| Boolean | If false, deployment will fail if name conflicts with that of another pipeline. |
| String | Budget policy of this pipeline. |
| String | A catalog in Unity Catalog to publish data from this pipeline to. If |
| String | The Lakeflow Spark Declarative Pipelines Release Channel that specifies which version of Lakeflow Spark Declarative Pipelines to use. |
| Sequence | The cluster settings for this pipeline deployment. See cluster. |
| Map | The configuration for this pipeline execution. |
| Boolean | Whether the pipeline is continuous or triggered. This replaces |
| Map | Deployment type of this pipeline. See deployment. |
| Boolean | Whether the pipeline is in development mode. Defaults to false. |
| Boolean | Whether the pipeline is a dry run pipeline. |
| String | The pipeline product edition. |
| Map | The environment specification for this pipeline used to install dependencies on serverless compute. This key is only supported in Databricks CLI version 0.258 and above. |
| Map | The event log configuration for this pipeline. See event_log. |
| Map | The filters that determine which pipeline packages to include in the deployed graph. See filters. |
| String | Unique identifier for this pipeline. |
| Map | The configuration for a managed ingestion pipeline. These settings cannot be used with the |
| Sequence | A list of libraries or code needed by this deployment. See PipelineLibrary. |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | A friendly name for this pipeline. |
| Sequence | The notification settings for this pipeline. |
| Sequence | The pipeline's permissions. See permissions. |
| Boolean | Whether Photon is enabled for this pipeline. |
| String | The root path for this pipeline. This is used as the root directory when editing the pipeline in the Databricks user interface and it is added to sys.path when executing Python sources during pipeline execution. |
| Map | The identity that the pipeline runs as. If not specified, the pipeline runs as the user who created the pipeline. Only |
| String | The default schema (database) where tables are read from or published to. |
| Boolean | Whether serverless compute is enabled for this pipeline. |
| String | The DBFS root directory for storing checkpoints and tables. |
| Map | A map of tags associated with the pipeline. These are forwarded to the cluster as cluster tags, and are therefore subject to the same limitations. A maximum of 25 tags can be added to the pipeline. |
| String | Target schema (database) to add tables in this pipeline to. Exactly one of |
pipeline.deployment
Type: Map
Deployment type configuration for the pipeline.
Key | Type | Description |
|---|---|---|
| String | The kind of deployment. For example, |
| String | The path to the metadata file for the deployment. |
pipeline.environment
Type: Map
Environment specification for installing dependencies on serverless compute.
Key | Type | Description |
|---|---|---|
| Map | The specification for the environment. See spec. |
pipeline.environment.spec
Type: Map
The specification for the environment.
Key | Type | Description |
|---|---|---|
| String | The client version (for example, |
| Sequence | A list of dependencies to install (for example, |
pipeline.event_log
Type: Map
Event log configuration for the pipeline.
Key | Type | Description |
|---|---|---|
| Boolean | Whether event logging is enabled. |
| String | The storage location for event logs. |
pipeline.filters
Type: Map
Filters that determine which pipeline packages to include in the deployed graph.
Key | Type | Description |
|---|---|---|
| Sequence | A list of package names to include. |
| Sequence | A list of package names to exclude. |
pipeline.ingestion_definition
Type: Map
Configuration for a managed ingestion pipeline.
Key | Type | Description |
|---|---|---|
| String | The name of the connection to use for ingestion. |
| String | The ID of the ingestion gateway. |
| Sequence | A list of objects to ingest. Each object can be a SchemaSpec, TableSpec, or ReportSpec. See SchemaSpec, TableSpec, and ReportSpec. |
| Map | Configuration for the ingestion tables. See table_configuration. |
SchemaSpec
Type: Map
Schema object specification for ingesting all tables from a schema.
Key | Type | Description |
|---|---|---|
| String | The name of the source schema to ingest. |
| String | The name of the destination catalog in Unity Catalog. |
| String | The name of the destination schema in Unity Catalog. |
| Map | Configuration to apply to all tables in this schema. See pipeline.ingestion_definition.table_configuration. |
TableSpec
Type: Map
Table object specification for ingesting a specific table.
Key | Type | Description |
|---|---|---|
| String | The name of the source schema containing the table. |
| String | The name of the source table to ingest. |
| String | The name of the destination catalog in Unity Catalog. |
| String | The name of the destination schema in Unity Catalog. |
| String | The name of the destination table in Unity Catalog. |
| Map | Configuration for this specific table. See pipeline.ingestion_definition.table_configuration. |
ReportSpec
Type: Map
Report object specification for ingesting analytics reports.
Key | Type | Description |
|---|---|---|
| String | The URL of the source report. |
| String | The name or identifier of the source report. |
| String | The name of the destination catalog in Unity Catalog. |
| String | The name of the destination schema in Unity Catalog. |
| String | The name of the destination table for the report data. |
| Map | Configuration for the report table. See pipeline.ingestion_definition.table_configuration. |
pipeline.ingestion_definition.table_configuration
Type: Map
Configuration options for ingestion tables.
Key | Type | Description |
|---|---|---|
| Sequence | A list of column names to use as primary keys for the table. |
| Boolean | Whether to include Salesforce formula fields in the ingestion. |
| String | The type of slowly changing dimension (SCD) to apply. Valid values: |
PipelineLibrary
Type: Map
Defines a library or code needed by this pipeline.
Key | Type | Description |
|---|---|---|
| Map | The path to a file that defines a pipeline and is stored in Databricks Repos. See pipeline.libraries.file. |
| Map | The unified field to include source code. Each entry can be a notebook path, a file path, or a folder path that ends |
| Map | The path to a notebook that defines a pipeline and is stored in the Databricks workspace. See pipeline.libraries.notebook. |
| String | This field is deprecated |
pipeline.libraries.file
Type: Map
The path to a file that defines a pipeline and is stored in the Databricks Repos.
Key | Type | Description |
|---|---|---|
| String | The absolute path of the source code. |
pipeline.libraries.glob
Type: Map
The unified field to include source code. Each entry can be a notebook path, a file path, or a folder path that ends /**. This field cannot be used together with notebook or file.
Key | Type | Description |
|---|---|---|
| String | The source code to include for pipelines |
pipeline.libraries.notebook
Type: Map
The path to a notebook that defines a pipeline and is stored in the Databricks workspace.
Key | Type | Description |
|---|---|---|
| String | The absolute path of the source code. |
Example
The following example defines a pipeline with the resource key hello-pipeline:
resources:
pipelines:
hello-pipeline:
name: hello-pipeline
clusters:
- label: default
num_workers: 1
development: true
continuous: false
channel: CURRENT
edition: CORE
photon: false
libraries:
- notebook:
path: ./pipeline.py
For additional pipeline configuration examples, see Pipeline configuration.
quality_monitor (Unity Catalog)
Type: Map
The quality_monitor resource allows you to define a Unity Catalog table monitor. For information about monitors, see Data profiling.
quality_monitors:
<quality_monitor-name>:
<quality_monitor-field-name>: <quality_monitor-field-value>
Key | Type | Description |
|---|---|---|
| String | The directory to store monitoring assets (e.g. dashboard, metric tables). |
| String | Name of the baseline table from which drift metrics are computed from. Columns in the monitored table should also be present in the baseline table. |
| Sequence | Custom metrics to compute on the monitored table. These can be aggregate metrics, derived metrics (from already computed aggregate metrics), or drift metrics (comparing metrics across time windows). See custom_metrics. |
| Map | Configuration for monitoring inference logs. See inference_log. |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| Map | The notification settings for the monitor. See notifications. |
| String | Schema where output metric tables are created. |
| Map | The schedule for automatically updating and refreshing metric tables. See schedule. |
| Boolean | Whether to skip creating a default dashboard summarizing data quality metrics. |
| Sequence | List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For high-cardinality columns, only the top 100 unique values by frequency will generate slices. |
| Map | Configuration for monitoring snapshot tables. See snapshot. |
| String | The full name of the table. |
| Map | Configuration for monitoring time series tables. See time_series. |
| String | Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used. |
quality_monitor.custom_metrics
Type: Sequence
Key | Type | Description |
|---|---|---|
| String | Jinja template for a SQL expression that specifies how to compute the metric. See create metric definition. |
| Sequence | A list of column names in the input table the metric should be computed for. Can use |
| String | Name of the metric in the output tables. |
| String | The output type of the custom metric. |
| String | Can only be one of
|
quality_monitor.data_classification_config
Type: Map
Configuration for data classification.
Key | Type | Description |
|---|---|---|
| Boolean | Whether data classification is enabled. |
quality_monitor.inference_log
Type: Map
Configuration for monitoring inference logs.
Key | Type | Description |
|---|---|---|
| Sequence | The time granularities for aggregating inference logs (for example, |
| String | The name of the column containing the model ID. |
| String | The name of the column containing the prediction. |
| String | The name of the column containing the timestamp. |
| String | The type of ML problem. Valid values include |
| String | The name of the column containing the label (ground truth). |
quality_monitor.notifications
Type: Map
Notification settings for the monitor.
Key | Type | Description |
|---|---|---|
| Map | Notification settings when the monitor fails. See on_failure. |
| Map | Notification settings when new classification tags are detected. See on_new_classification_tag_detected. |
quality_monitor.notifications.on_failure
Type: Map
Notification settings when the monitor fails.
Key | Type | Description |
|---|---|---|
| Sequence | A list of email addresses to notify on monitor failure. |
quality_monitor.notifications.on_new_classification_tag_detected
Type: Map
Notification settings when new classification tags are detected.
Key | Type | Description |
|---|---|---|
| Sequence | A list of email addresses to notify when new classification tags are detected. |
quality_monitor.schedule
Type: Map
Schedule for automatically updating and refreshing metric tables.
Key | Type | Description |
|---|---|---|
| String | A Cron expression using Quartz syntax. For example, |
| String | The timezone for the schedule (for example, |
| String | Whether the schedule is paused. Valid values: |
quality_monitor.snapshot
Type: Map
Configuration for monitoring snapshot tables.
quality_monitor.time_series
Configuration for monitoring time series tables.
Key | Type | Description |
|---|---|---|
| Sequence | The time granularities for aggregating time series data (for example, |
| String | The name of the column containing the timestamp. |
Examples
For a complete example bundle that defines a quality_monitor, see the mlops_demo bundle.
The following examples define quality monitors for InferenceLog, TimeSeries, and Snapshot profile types.
# InferenceLog profile type
resources:
quality_monitors:
my_quality_monitor:
table_name: dev.mlops_schema.predictions
output_schema_name: ${bundle.target}.mlops_schema
assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
inference_log:
granularities: [1 day]
model_id_col: model_id
prediction_col: prediction
label_col: price
problem_type: PROBLEM_TYPE_REGRESSION
timestamp_col: timestamp
schedule:
quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
timezone_id: UTC
# TimeSeries profile type
resources:
quality_monitors:
my_quality_monitor:
table_name: dev.mlops_schema.predictions
output_schema_name: ${bundle.target}.mlops_schema
assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
time_series:
granularities: [30 minutes]
timestamp_col: timestamp
schedule:
quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
timezone_id: UTC
# Snapshot profile type
resources:
quality_monitors:
my_quality_monitor:
table_name: dev.mlops_schema.predictions
output_schema_name: ${bundle.target}.mlops_schema
assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
snapshot: {}
schedule:
quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
timezone_id: UTC
registered_model (Unity Catalog)
Type: Map
The registered model resource allows you to define models in Unity Catalog. For information about Unity Catalog registered models, see Manage model lifecycle in Unity Catalog.
registered_models:
<registered_model-name>:
<registered_model-field-name>: <registered_model-field-value>
Key | Type | Description |
|---|---|---|
| Sequence | List of aliases associated with the registered model. See registered_model.aliases. |
| Boolean | Indicates whether the principal is limited to retrieving metadata for the associated object through the BROWSE privilege when include_browse is enabled in the request. |
| String | The name of the catalog where the schema and the registered model reside. |
| String | The comment attached to the registered model. |
| String | The three-level (fully qualified) name of the registered model |
| Sequence | The grants associated with the registered model. See grant. |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | The name of the registered model. |
| String | The name of the schema where the registered model resides. |
| String | The storage location on the cloud under which model version data files are stored. |
registered_model.aliases
Type: Sequence
List of aliases associated with the registered model
Key | Type | Description |
|---|---|---|
| String | Name of the alias, e.g. 'champion' or 'latest_stable' |
| String | The name of the catalog containing the model version |
| String | The unique identifier of the alias |
| String | The name of the parent registered model of the model version, relative to parent schema |
| String | The name of the schema containing the model version, relative to parent catalog |
| Integer | Integer version number of the model version to which this alias points. |
Example
The following example defines a registered model in Unity Catalog:
resources:
registered_models:
model:
name: my_model
catalog_name: ${bundle.target}
schema_name: mlops_schema
comment: Registered model in Unity Catalog for ${bundle.target} deployment target
grants:
- privileges:
- EXECUTE
principal: account users
schema (Unity Catalog)
Type: Map
Schemas are supported in Python for Databricks Asset Bundles. See databricks.bundles.schemas.
The schema resource type allows you to define Unity Catalog schemas for tables and other assets in your workflows and pipelines created as part of a bundle. A schema, different from other resource types, has the following limitations:
- The owner of a schema resource is always the deployment user, and cannot be changed. If
run_asis specified in the bundle, it will be ignored by operations on the schema. - Only fields supported by the corresponding Schemas object create API are available for the schema resource. For example,
enable_predictive_optimizationis not supported as it is only available on the update API.
schemas:
<schema-name>:
<schema-field-name>: <schema-field-value>
Key | Type | Description |
|---|---|---|
| String | The name of the parent catalog. |
| String | A user-provided free-form text description. |
| Sequence | The grants associated with the schema. See grant. |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | The name of schema, relative to the parent catalog. |
| Map | A map of key-value properties attached to the schema. |
| String | The storage root URL for managed tables within the schema. |
Examples
The following example defines a pipeline with the resource key my_pipeline that creates a Unity Catalog schema with the key my_schema as the target:
resources:
pipelines:
my_pipeline:
name: test-pipeline-{{.unique_id}}
libraries:
- notebook:
path: ../src/nb.ipynb
- file:
path: ../src/range.sql
development: true
catalog: ${resources.schemas.my_schema.catalog_name}
target: ${resources.schemas.my_schema.id}
schemas:
my_schema:
name: test-schema-{{.unique_id}}
catalog_name: main
comment: This schema was created by Databricks Asset Bundles.
A top-level grants mapping is not supported by Databricks Asset Bundles, so if you want to set grants for a schema, define the grants for the schema within the schemas mapping. For more information about grants, see Show, grant, and revoke privileges.
The following example defines a Unity Catalog schema with grants:
resources:
schemas:
my_schema:
name: test-schema
grants:
- principal: users
privileges:
- SELECT
- principal: my_team
privileges:
- CAN_MANAGE
catalog_name: main
secret_scope
Type: Map
The secret_scope resource allows you to define secret scopes in a bundle. For information about secret scopes, see Secret management.
secret_scopes:
<secret_scope-name>:
<secret_scope-field-name>: <secret_scope-field-value>
Key | Type | Description |
|---|---|---|
| String | The backend type the scope will be created with. If not specified, this defaults to |
| Map | The metadata for the secret scope if the |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | Scope name requested by the user. Scope names are unique. |
| Sequence | The permissions to apply to the secret scope. Permissions are managed via secret scope ACLs. See permissions. |
secret_scope.keyvault_metadata
Type: Map
The metadata for Azure Key Vault-backed secret scopes.
Key | Type | Description |
|---|---|---|
| String | The Azure resource ID of the Key Vault. |
| String | The DNS name of the Azure Key Vault. |
Examples
The following example defines a secret scope that uses a key vault backend:
resources:
secret_scopes:
secret_scope_azure:
name: test-secrets-azure-backend
backend_type: 'AZURE_KEYVAULT'
keyvault_metadata:
resource_id: my_azure_keyvault_id
dns_name: my_azure_keyvault_dns_name
The following example sets a custom ACL using secret scopes and permissions:
resources:
secret_scopes:
my_secret_scope:
name: my_secret_scope
permissions:
- user_name: admins
level: WRITE
- user_name: users
level: READ
For an example bundle that demonstrates how to define a secret scope and a job with a task that reads from it in a bundle, see the bundle-examples GitHub repository.
sql_warehouse
Type: Map
The SQL warehouse resource allows you to define a SQL warehouse in a bundle. For information about SQL warehouses, see Data warehousing on Databricks.
sql_warehouses:
<sql-warehouse-name>:
<sql-warehouse-field-name>: <sql-warehouse-field-value>
Key | Type | Description |
|---|---|---|
| Integer | The amount of time in minutes that a SQL warehouse must be idle (for example, no RUNNING queries), before it is automatically stopped. Valid values are 0, which indicates no autostop, or greater than or equal to 10. The default is 120. |
| Map | The channel details. See channel |
| String | The size of the clusters allocated for this warehouse. Increasing the size of a Spark cluster allows you to run larger queries on it. If you want to increase the number of concurrent queries, tune max_num_clusters. For supported values, see cluster_size. |
| String | The name of the user that created the warehouse. |
| Boolean | Whether the warehouse should use Photon optimized clusters. Defaults to false. |
| Boolean | Whether the warehouse should use serverless compute. |
| String | Deprecated. Instance profile used to pass IAM role to the cluster, |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| Integer | The maximum number of clusters that the autoscaler will create to handle concurrent queries. Values must be less than or equal to 30 and greater than or equal to |
| Integer | The minimum number of available clusters that will be maintained for this SQL warehouse. Increasing this will ensure that a larger number of clusters are always running and therefore may reduce the cold start time for new queries. This is similar to reserved vs. revocable cores in a resource manager. Values must be greater than 0 and less than or equal to min(max_num_clusters, 30). Defaults to 1. |
| String | The logical name for the cluster. The name must be unique within an org and less than 100 characters. |
| Sequence | The permissions to apply to the warehouse. See permissions. |
| String | Whether to use spot instances. Valid values are |
| Map | A set of key-value pairs that will be tagged on all resources (e.g., AWS instances and EBS volumes) associated with this SQL warehouse. The number of tags must be less than 45. |
| String | The warehouse type, |
sql_warehouse.channel
Type: Map
The channel configuration for the SQL warehouse.
Key | Type | Description |
|---|---|---|
| String | The name of the channel. Valid values include |
| String | The DBSQL version for custom channels. |
Example
The following example defines a SQL warehouse:
resources:
sql_warehouses:
my_sql_warehouse:
name: my_sql_warehouse
cluster_size: X-Large
enable_serverless_compute: true
max_num_clusters: 3
min_num_clusters: 1
auto_stop_mins: 60
warehouse_type: PRO
synced_database_table
Type: Map
The synced database table resource allows you to define Lakebase database tables in a bundle.
For information about synced database tables, see What is a database instance?.
synced_database_tables:
<synced_database_table-name>:
<synced_database_table-field-name>: <synced_database_table-field-value>
Key | Type | Description |
|---|---|---|
| String | The name of the target database instance. This is required when creating synced database tables in standard catalogs. This is optional when creating synced database tables in registered catalogs. |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | The name of the target Postgres database object (logical database) for this table. |
| String | The full name of the table, in the form |
| Map | The database table specification. See synced database table specification. |
synced_database_table.spec
Type: Map
The database table specification.
Key | Type | Description |
|---|---|---|
| Boolean | Whether to create the synced table's logical database and schema resources if they do not already exist. |
| String | The ID for an existing pipeline. If this is set, the synced table will be bin packed into the existing pipeline referenced. This avoids creating a new pipeline and allows sharing existing compute. In this case, the |
| Map | The specification for a new pipeline. See new_pipeline_spec. At most one of |
| Sequence | The list of column names that form the primary key. |
| String | The scheduling policy for syncing. Valid values include |
| String | The full name of the source table in the format |
| String | Time series key to de-duplicate rows with the same primary key. |
synced_database_table.spec.new_pipeline_spec
Type: Map
The specification for a new pipeline used by the synced database table.
Key | Type | Description |
|---|---|---|
| String | The catalog for the pipeline to store intermediate files, such as checkpoints and event logs. This needs to be a standard catalog where the user has permissions to create Delta tables. |
| String | The schema for the pipeline to store intermediate files, such as checkpoints and event logs. This needs to be in the standard catalog where the user has permissions to create Delta tables. |
Examples
The following example defines a synced database table within a corresponding database catalog:
resources:
database_instances:
my_instance:
name: my-instance
capacity: CU_1
database_catalogs:
my_catalog:
database_instance_name: my-instance
database_name: 'my_database'
name: my_catalog
create_database_if_not_exists: true
synced_database_tables:
my_synced_table:
name: ${resources.database_catalogs.my_catalog.name}.${resources.database_catalogs.my_catalog.database_name}.my_destination_table
database_instance_name: ${resources.database_catalogs.my_catalog.database_instance_name}
logical_database_name: ${resources.database_catalogs.my_catalog.database_name}
spec:
source_table_full_name: 'my_source_table'
scheduling_policy: SNAPSHOT
primary_key_columns:
- my_pk_column
new_pipeline_spec:
storage_catalog: 'my_delta_catalog'
storage_schema: 'my_delta_schema'
The following example defines a synced database table inside a standard catalog:
resources:
synced_database_tables:
my_synced_table:
name: 'my_standard_catalog.public.synced_table'
# database_instance_name is required for synced tables created in standard catalogs.
database_instance_name: 'my-database-instance'
# logical_database_name is required for synced tables created in standard catalogs:
logical_database_name: ${resources.database_catalogs.my_catalog.database_name}
spec:
source_table_full_name: 'source_catalog.schema.table'
scheduling_policy: SNAPSHOT
primary_key_columns:
- my_pk_column
create_database_objects_if_missing: true
new_pipeline_spec:
storage_catalog: 'my_delta_catalog'
storage_schema: 'my_delta_schema'
This example creates a synced database table and customizes the pipeline schedule for it. It assumes you already have:
- A database instance named
my-database-instance - A standard catalog named
my_standard_catalog - A schema in the standard catalog named
default - A source delta table named
source_delta.schema.customerwith the primary keyc_custkey
resources:
synced_database_tables:
my_synced_table:
name: 'my_standard_catalog.default.my_synced_table'
database_instance_name: 'my-database-instance'
logical_database_name: 'test_db'
spec:
source_table_full_name: 'source_delta.schema.customer'
scheduling_policy: SNAPSHOT
primary_key_columns:
- c_custkey
create_database_objects_if_missing: true
new_pipeline_spec:
storage_catalog: 'source_delta'
storage_schema: 'schema'
jobs:
sync_pipeline_schedule_job:
name: sync_pipeline_schedule_job
description: 'Job to schedule synced database table pipeline.'
tasks:
- task_key: synced-table-pipeline
pipeline_task:
pipeline_id: ${resources.synced_database_tables.my_synced_table.data_synchronization_status.pipeline_id}
schedule:
quartz_cron_expression: '0 0 0 * * ?'
volume (Unity Catalog)
Type: Map
Volumes are supported in Python for Databricks Asset Bundles. See databricks.bundles.volumes.
The volume resource type allows you to define and create Unity Catalog volumes as part of a bundle. When deploying a bundle with a volume defined, note that:
- A volume cannot be referenced in the
artifact_pathfor the bundle until it exists in the workspace. Hence, if you want to use Databricks Asset Bundles to create the volume, you must first define the volume in the bundle, deploy it to create the volume, then reference it in theartifact_pathin subsequent deployments. - Volumes in the bundle are not prepended with the
dev_${workspace.current_user.short_name}prefix when the deployment target hasmode: developmentconfigured. However, you can manually configure this prefix. See Custom presets.
volumes:
<volume-name>:
<volume-field-name>: <volume-field-value>
Key | Type | Description |
|---|---|---|
| String | The name of the catalog of the schema and volume. |
| String | The comment attached to the volume. |
| Sequence | The grants associated with the volume. See grant. |
| Map | Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle. |
| String | The name of the volume. |
| String | The name of the schema where the volume is. |
| String | The storage location on the cloud. |
| String | The volume type, either |
Example
The following example creates a Unity Catalog volume with the key my_volume_id:
resources:
volumes:
my_volume_id:
catalog_name: main
name: my_volume
schema_name: my_schema
For an example bundle that runs a job that writes to a file in Unity Catalog volume, see the bundle-examples GitHub repository.
Common objects
grant
Type: Map
Defines the principal and privileges to grant to that principal. For more information about grants, see Show, grant, and revoke privileges.
Key | Type | Description |
|---|---|---|
| String | The name of the principal that will be granted privileges. This can be a user, group, or service principal. |
| Sequence | The privileges to grant to the specified entity. Valid values depend on the resource type (for example, |
Example
The following example defines a Unity Catalog schema with grants:
resources:
schemas:
my_schema:
name: test-schema
grants:
- principal: users
privileges:
- SELECT
- principal: my_team
privileges:
- CAN_MANAGE
catalog_name: main
lifecycle
Type: Map
Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed.
Key | Type | Description |
|---|---|---|
| Boolean | Lifecycle setting to prevent the resource from being destroyed. |