Databricks Asset Bundle configurations
This article describes the syntax for Databricks Asset Bundle configuration files, which define Databricks Asset Bundles. See What are Databricks Asset Bundles?
A bundle configuration file must be expressed in YAML format and must contain at minimum the top-level bundle mapping. Each bundle must contain at minimum one (and only one) bundle configuration file named databricks.yml
. If there are multiple bundle configuration files, they must be referenced by the databricks.yml
file.
For more information about YAML, see the official YAML specification and tutorial.
To create and work with bundle configuration files, see Databricks Asset Bundles development workflow.
Overview
This section provides a visual representation of the bundle configuration file schema. For details, see Mappings.
# These is the default bundle configuration if not otherwise overridden in
# the "targets" top-level mapping.
bundle: # Required.
name: string # Required.
databricks_cli_version: string
compute_id: string
git:
origin_url: string
branch: string
# These are for any custom variables for use throughout the bundle.
variables:
<some-unique-variable-name>:
description: string
default: string
# These are the default workspace settings if not otherwise overridden in
# the following "targets" top-level mapping.
workspace:
artifact_path: string
auth_type: string
azure_client_id: string # For Azure Databricks only.
azure_environment: string # For Azure Databricks only.
azure_login_app_id: string # For Azure Databricks only. Non-operational and reserved for future use.
azure_tenant_id: string # For Azure Databricks only.
azure_use_msi: true | false # For Azure Databricks only.
azure_workspace_resource_id: string # For Azure Databricks only.
client_id: string # For Databricks on AWS only.
file_path: string
google_service_account: string # For Databricks on Google Cloud only.
host: string
profile: string
root_path: string
state_path: string
# These are the permissions to apply to experiments, jobs, models, and pipelines defined
# in the "resources" mapping.
permissions:
- level: <permission-level>
group_name: <unique-group-name>
- level: <permission-level>
user_name: <unique-user-name>
- level: <permission-level>
service_principal_name: <unique-principal-name>
# These are the default artifact settings if not otherwise overridden in
# the following "targets" top-level mapping.
artifacts:
<some-unique-artifact-identifier>:
build: string
files:
- source: string
path: string
type: string
# These are any additional configuration files to include.
include:
- "<some-file-or-path-glob-to-include>"
- "<another-file-or-path-glob-to-include>"
# This is the identity to use to run the bundle
run_as:
- user_name: <user-name>
- service_principal_name: <service-principal-name>
# These are the default job and pipeline settings if not otherwise overridden in
# the following "targets" top-level mapping.
resources:
experiments:
<some-unique-programmatic-identifier-for-this-experiment>:
# See the Experiments API's create experiment request payload reference.
jobs:
<some-unique-programmatic-identifier-for-this-job>:
# See the Jobs API's create job request payload reference.
models:
<some-unique-programmatic-identifier-for-this-model>:
# See the Models API's create model request payload reference.
pipelines:
<some-unique-programmatic-identifier-for-this-pipeline>:
# See the Delta Live Tables API's create pipeline request payload reference.
# These are any additional files or paths to include or exclude.
sync:
include:
- "<some-file-or-path-glob-to-include>"
- "<another-file-or-path-glob-to-include>"
exclude:
- "<some-file-or-path-glob-to-exclude>"
- "<another-file-or-path-glob-to-exclude>"
# These are the targets to use for deployments and workflow runs. One and only one of these
# targets can be set to "default: true".
targets:
<some-unique-programmatic-identifier-for-this-target>:
artifacts:
# See the preceding "artifacts" syntax.
bundle:
# See the preceding "bundle" syntax.
compute_id: string
default: true | false
mode: development
resources:
# See the preceding "resources" syntax.
sync:
# See the preceding "sync" syntax.
variables:
<preceding-unique-variable-name>: <non-default-value>
workspace:
# See the preceding "workspace" syntax.
run_as:
# See the preceding "run_as" syntax.
Examples
Following is an example bundle configuration file. This bundle specifies the remote deployment of a local file named hello.py
that is in the same directory as this local bundle configuration file named databricks.yml
. It runs this notebook as a job by using the remote cluster with the specified cluster ID. The remote workspace URL and workspace authentication credentials are read from the caller’s local configuration profile named DEFAULT
.
Note
Databricks recommends that you use the host
mapping instead of the default
mapping wherever possible, as this makes your bundle configuration files more portable. Setting the host
mapping instructs the Databricks CLI to find a matching profile in your .databrickscfg
file and then use that profile’s fields to determine which Databricks authentication type to use. If multiple profiles with a matching host
field exist within your .databrickscfg
file, then you must use the profile
to instruct the Databricks CLI about which specific profile to use. For an example, see the prod
target declaration later in this section.
This technique enables you to reuse as well as to override the job definitions and settings within the resources
block:
bundle:
name: hello-bundle
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
targets:
dev:
default: true
While the following bundle configuration file is functionally equivalent, it is not modularized, which does not enable good reuse. Also, this declaration appends a task to the job rather the overriding the existing job:
bundle:
name: hello-bundle
targets:
dev:
default: true
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
Following is the previous modularized example but with the addition of a target with the programmatic (or logical) name prod
that uses a different remote workspace URL and workspace authentication credentials, which are read from the caller’s .databrickscfg
file’s matching host
entry with the specified workspace URL. This job runs the same notebook but uses a different remote cluster with the specified cluster ID. Notice that you do not need to declare the notebook_task
mapping within the prod
mapping, as it falls back to use the notebook_task
mapping within the top-level resources
mapping, if the notebook_task
mapping is not explicitly overridden within the prod
mapping.
bundle:
name: hello-bundle
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
targets:
dev:
default: true
prod:
workspace:
host: https://<production-workspace-url>
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 2345-678901-fabcd456
To validate, deploy, and run this job within the dev
target, run the following commands:
# Because the "dev" target is set to "default: true",
# you do not need to specify "-t dev":
databricks bundle validate
databricks bundle deploy
databricks bundle run hello_job
# But you can still explicitly specify it, if you want or need to:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev hello_job
To validate, deploy, and run this job within the prod
target instead, run the following commands:
# You must specify "-t prod", because the "dev" target
# is already set to "default: true":
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod hello_job
Following is the previous example but split up into component files for even more modularization and better reuse across multiple bundle configuration files. This technique enables you to not only reuse various definitions and settings, but you can also swap out any of these files with other files that provide completely different declarations:
databricks.yml
:
bundle:
name: hello-bundle
include:
- "bundle*.yml"
bundle.resources.yml
:
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
bundle.targets.yml
:
targets:
dev:
default: true
prod:
workspace:
host: https://<production-workspace-url>
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 2345-678901-fabcd456
For more examples, see the bundle examples repository in GitHub.
Mappings
The following sections describe the bundle configuration file syntax, by top-level mapping.
bundle
A bundle configuration file must contain only one top-level bundle
mapping that associates the bundle’s contents and Databricks workspace settings.
This bundle
mapping must contain a name
mapping that specifies a programmatic (or logical) name for the bundle. The following example declares a bundle with the programmatic (or logical) name hello-bundle
.
bundle:
name: hello-bundle
A bundle
mapping can also be a child of one or more of the targets in the top-level targets mapping. Each of these child bundle
mappings specify any non-default overrides at the target level. However, the top-level bundle
mapping’s name
value cannot be overridden at the target level.
compute_id
The bundle
mapping can have a child compute_id
mapping. This mapping enables you to specify the ID of a cluster to use as an override for any and all clusters defined elsewhere in the bundle configuration file. This override is intended for development-only scenarios prior to production. The compute_id
mapping works only for the target that has its mode
mapping set to development
. For more information about the compute_id
mapping, see the targets mapping.
git
You can retrieve and override Git version control details that are associated with your bundle. This is useful for annotating your deployed resources. For example, you might want to include the origin URL of your repository within the description of a machine learning model that you deploy.
Whenever you run a bundle
command such as validate
, deploy
or run
, the bundle
command populates the command’s configuration tree with the following default settings:
bundle.git.origin_url
, which represents the origin URL of the repo. This is the same value that you would get if you ran the commandgit config --get remote.origin.url
from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as${bundle.git.origin_url}
.bundle.git.branch
, which represents the current branch within the repo. This is the same value that you would get if you ran the commandgit branch --show-current
from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as${bundle.git.branch}
.bundle.git.commit
, which represents theHEAD
commit within the repo. This is the same value that you would get if you ran the commandgit rev-parse HEAD
from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as${bundle.git.commit}
.
To retrieve or override Git settings, your bundle must be within a directory that is associated with a Git repository, for example a local directory that is initialized by running the git clone
command. If the directory is not associated with a Git repository, these Git settings are empty.
You can override the origin_url
and branch
settings within the git
mapping of your top-level bundle
mapping if needed, as follows:
bundle:
git:
origin_url: <some-non-default-origin-url>
branch: <some-non-current-branch-name>
databricks_cli_version
The bundle
mapping can contain a databricks_cli_version
mapping that constrains the Databricks CLI version required by the bundle. This can prevent issues caused by using mappings that are not supported in a certain version of the Databricks CLI.
The Databricks CLI version conforms to semantic versioning and the databricks_cli_version
mapping supports specifying version constraints. If the current databricks --version
value is not within the bounds specified in the bundle’s databricks_cli_version
mapping, an error occurs when databricks bundle validate
is executed on the bundle. The following examples demonstrate some common version constraint syntax:
bundle:
name: hello-bundle
databricks_cli_version: "0.218.0" # require Databricks CLI 0.218.0
bundle:
name: hello-bundle
databricks_cli_version: "0.218.*" # allow all patch versions of Databricks CLI 0.218
bundle:
name: my-bundle
databricks_cli_version: ">= 0.218.0" # allow any version of Databricks CLI 0.218.0 or higher
bundle:
name: my-bundle
databricks_cli_version: ">= 0.218.0, <= 1.0.0" # allow any Databricks CLI version between 0.218.0 and 1.0.0, inclusive
variables
The bundles settings file can contain one top-level variables
mapping to specify variable settings to use. See Custom variables.
workspace
The bundle configuration file can contain only one top-level workspace
mapping to specify any non-default Databricks workspace settings to use.
This workspace
mapping can contain a root_path
mapping to specify a non-default root path to use within the workspace for both deployments and workflow runs, for example:
workspace:
root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}
By default, for root_path
the Databricks CLI uses the default path of /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
, which uses substitutions.
This workspace
mapping can also contain an artifact_path
mapping to specify a non-default artifact path to use within the workspace for both deployments and workflow runs, for example:
workspace:
artifact_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}/artifacts
By default, for artifact_path
the Databricks CLI uses the default path of ${workspace.root}/artifacts
, which uses substitutions.
..note:: The artifact_path
mapping does not support Databricks File System (DBFS) paths.
This workspace
mapping can also contain a file_path
mapping to specify a non-default file path to use within the workspace for both deployments and workflow runs, for example:
workspace:
file_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}/files
By default, for file_path
the Databricks CLI uses the default path of ${workspace.root}/files
, which uses substitutions.
The state_path
mapping defaults to the default path of ${workspace.root}/state
and represents the path within your workspace to store Terraform state information about deployments.
The workspace
mapping can also contain the following optional mappings to specify the Databricks authentication mechanism to use. If they are not specified within this workspace
mapping, they must be specified in a workspace
mapping as a child of one or more of the targets in the top-level targets mapping.
Important
You must hard-code values for the following workspace
mappings for Databricks authentication. For instance, you cannot specify custom variables for these mappings’ values by using the ${var.*}
syntax.
The
profile
mapping, (or the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI) specifies the name of a configuration profile to use with this workspace for Databricks authentication. This configuration profile maps to the one that you created when you set up the Databricks CLI.Note
Databricks recommends that you use the
host
mapping (or the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI) instead of theprofile
mapping, as this makes your bundle configuration files more portable. Setting thehost
mapping instructs the Databricks CLI to find a matching profile in your.databrickscfg
file and then use that profile’s fields to determine which Databricks authentication type to use. If multiple profiles with a matchinghost
field exist within your.databrickscfg
file, then you must use theprofile
mapping (or the--profile
or-p
command-line options) to instruct the Databricks CLI about which profile to use. For an example, see theprod
target declaration in the examples.
The
host
mapping specifies the URL for your Databricks workspace. See Workspace instance names, URLs, and IDs.For OAuth machine-to-machine (M2M) authentication, the mapping
client_id
is used. Alternatively, you can set this value in the local environment variableDATABRICKS_CLIENT_ID
. Or you can create a configuration profile with theclient_id
value and then specify the profile’s name with theprofile
mapping (or by using the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See OAuth machine-to-machine (M2M) authentication.Note
You cannot specify a client secret value in the bundle configuration file. Instead, set the local environment variable
DATABRICKS_CLIENT_SECRET
. Or you can add theclient_secret
value to a configuration profile and then specify the profile’s name with theprofile
mapping (or by using the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI).
The
auth_type
mapping specifies the Databricks authentication type to use, especially in cases where the Databricks CLI infers an unexpected authentication type. See the Authentication type field.
permissions
The top-level permissions
mapping specifies one or more permission levels to apply to all resources defined in the bundle. If you want to apply permissions to a specific resource, see Define permissions for a specific resource.
Allowed top-level permission levels are CAN_VIEW
, CAN_MANAGE
, and CAN_RUN
.
The following example in a bundle configuration file defines permission levels for a user, group, and service principal, which are applied to all jobs, pipelines, experiments, and models defined in resources
in the bundle:
permissions:
- level: CAN_VIEW
group_name: test-group
- level: CAN_MANAGE
user_name: someone@example.com
- level: CAN_RUN
service_principal_name: 123456-abcdef
artifacts
The top-level artifacts
mapping specifies one or more artifacts that are automatically built during bundle deployments and can be used later in bundle runs. Each child artifact supports the following mappings:
type
is required. To build a Python wheel file before deploying, this mapping must be set towhl
.path
is an optional, relative path from the location of the bundle configuration file to the location of the Python wheel file’ssetup.py
file. Ifpath
is not included, the Databricks CLI will attempt to find the Python wheel file’ssetup.py
file in the bundle’s root.files
is an optional mapping that includes a childsource
mapping, which you can use to specify non-default locations to include for complex build instructions. Locations are specified as relative paths from the location of the bundle configuration file.build
is an optional set of non-default build commands that you want to run locally before deployment. For Python wheel builds, the Databricks CLI assumes that it can find a local install of the Pythonwheel
package to run builds, and it runs the commandpython setup.py bdist_wheel
by default during each bundle deployment. To specify multiple build commands, separate each command with double-ampersand (&&
) characters.
For more information, including a sample bundle that uses artifacts
, see Develop a Python wheel file using Databricks Asset Bundles.
Tip
You can define, combine, and override the settings for artifacts in bundles by using the techniques described in Define artifact settings dynamically in Databricks Asset Bundles.
include
The include
array specifies a list of path globs that contain configuration files to include within the bundle. These path globs are relative to the location of the bundle configuration file in which the path globs are specified.
The Databricks CLI does not include any configuration files by default within the bundle. You must use the include
array to specify any and all configuration files to include within the bundle, other than the databricks.yml
file itself.
This include
array can appear only as a top-level mapping.
The following example in a bundle configuration file includes the three specified configuration files. These files are in the same directory as the bundle configuration file:
include:
- "bundle.artifacts.yml"
- "bundle.resources.yml"
- "bundle.targets.yml"
The following example in a bundle configuration file includes all files with filenames that begin with bundle
and end with .yml
. These files are in the same directory as the bundle configuration file:
include:
- "bundle*.yml"
resources
The resources
mapping specifies information about the Databricks resources used by the bundle.
This resources
mapping can appear as a top-level mapping, or it can be a child of one or more of the targets in the top-level targets mapping, and includes zero or one of the supported resource types. Each resource type mapping includes one or more individual resource declarations, which must each have a unique name. These individual resource declarations use the create operation’s request payload, expressed in YAML, to define the resource. Create operation request payloads are documented in the Databricks REST API Reference.
The following table lists supported resource types for bundles and links to documentation on their corresponding payloads:
Resource type |
Resource mappings |
---|---|
|
Job mappings: POST /api/2.1/jobs/create For additional information, see job task types and override new job cluster settings. |
|
Pipeline mappings: POST /api/2.0/pipelines |
|
Experiment mappings: POST /api/2.0/mlflow/experiments/create |
|
Model mappings: POST /api/2.0/mlflow/registered-models/create |
|
Model serving endpoint mappings: POST /api/2.0/serving-endpoints |
|
Unity Catalog Model mappings: POST /api/2.1/unity-catalog/models |
All paths to folders and files referenced by resource declarations are relative to the location of the bundle configuration file in which these paths are specified.
The following example declares a job with the resource key of hello-job
and a pipeline with the resource key of hello-pipeline
:
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
pipelines:
hello-pipeline:
name: hello-pipeline
clusters:
- label: default
num_workers: 1
development: true
continuous: false
channel: CURRENT
edition: CORE
photon: false
libraries:
- notebook:
path: ./pipeline.py
sync
The sync
array specifies a list of file or path globs to include within bundle deployments or to exclude from bundle deployments, depending on the following rules:
Based on any list of file and path globs in a
.gitignore
file in the bundle’s root, theinclude
mapping can contain a list of file globs, path globs, or both, relative to the bundle’s root, to explicitly include.Based on any list of file and path globs in a
.gitignore
file in the bundle’s root, plus the list of file and path globs in theinclude
mapping, theexclude
mapping can contain a list of file globs, path globs, or both, relative to the bundle’s root, to explicitly exclude.
All paths to specified folders and files are relative to the location of the bundle configuration file in which these paths are specified.
The syntax for include
and exclude
file and path patterns follow standard .gitignore
pattern syntax. See gitignore Pattern Format.
For example, if the following .gitignore
file contains the following entries:
.databricks
my_package/dist
And the bundle configuration file contains the following include
mapping:
sync:
include:
- my_package/dist/*.whl
Then all of the files in the my_package/dist
folder with a file extension of *.whl
are included. Any other files in the my_package/dist
folder are not included.
However, if the bundle configuration file also contains the following exclude
mapping:
sync:
include:
- my_package/dist/*.whl
exclude:
- my_package/dist/delete-me.whl
Then all of the files in the my_package/dist
folder with a file extension of *.whl
, except for the file named delete-me.whl
, are included. Any other files in the my_package/dist
folder are also not included.
The sync
array can also be declared in the targets
mapping for a specific target. Any sync
array declared in a target is merged with any top-level sync
array declarations. For example, continuing with the preceding example, the following include
mapping at the targets
level merges with the include
mapping in the top-level sync
array:
targets:
dev:
sync:
include:
- my_package/dist/delete-me.whl
When you run databricks bundle validate
, the relevant portion of the resulting graph is as follows:
"sync": {
"include": [
"my_package/dist/*.whl",
"my_package/dist/delete-me.whl"
],
"exclude": [
"my_package/dist/delete-me.whl"
]
}
targets
The targets
mapping specifies one or more contexts in which to run Databricks workflows. Each target is a unique collection of artifacts, Databricks workspace settings, and Databricks job or pipeline details.
This targets
mapping is optional but highly recommended. If it is specified, it can appear only as a top-level mapping. If the targets
mapping is not specified, then the settings in the top-level workspace, artifacts, and resources mappings are always used.
The targets
mapping consists of one or more target mappings, which must each have a unique programmatic (or logical) name.
If a target mapping does not specify workspace
, artifacts
, or resources
child mappings, then that target uses the settings in the top-level workspace
, artifacts
, and resources
mappings.
If a target mapping specifies a workspace
, artifacts
, or resources
mapping, and a top-level workspace
, artifacts
, or resources
mapping also exists, then any conflicting settings are overridden by the settings within the target.
A target can also override the values of any top-level variables.
To specify that a target is the default one unless otherwise specified, add the default
mapping, set to true
. For example, this target named dev
is the default target:
targets:
dev:
default: true
To specify that a target is treated as a development target, add the mode
mapping, set to development
. To specify that a target is treated production target, add the mode
mapping, set to production
. For example, this target named prod
is treated as a production target:
targets:
prod:
mode: production
Specifying mode
provides a collection of corresponding default behaviors for pre-production and production workflows. For details, see Databricks Asset Bundle deployment modes. In addition, you can specify run_as
for each target, as described in Specify a run identity for a Databricks Asset Bundles workflow.
The following example declares two targets. The first target has a programmatic (or logical) name of dev
and is the default target. The second target has a programmatic (or logical) name of prod
and is not the default target. This second target uses a Databricks connection profile named PROD
for authentication:
targets:
dev:
default: true
prod:
workspace:
host: https://<production-workspace-url>
To validate, deploy, and run jobs or pipelines within the dev
target, run the following commands:
# Because the "dev" target is set to "default: true",
# you do not need to specify "-t dev":
databricks bundle validate
databricks bundle deploy
databricks bundle run <job-or-pipeline-programmatic-name>
# But you can still explicitly specify it, if you want or need to:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev <job-or-pipeline-programmatic-name>
To validate, deploy, and run this job within the prod
target instead, run the following commands:
# You must specify "-t prod", because the "dev" target
# is already set to "default: true":
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod <job-or-pipeline-programmatic-name>
Custom variables
You can use custom variables to make your bundle configuration files more modular and reusable. For example, you might declare a variable that represents the ID of an existing cluster, and then want to change that variable’s value to different cluster IDs for various workflow runs within multiple targets without changing your bundle configuration files’ original code.
You can declare one or more variables in your bundle configuration files within the variables
mapping. For each variable, you can set an optional description, default value, or lookup to retrieve an ID value, following this format:
variables:
<variable-name>:
description: <optional-description>
default: <optional-default-value>
lookup:
<optional-object-type>: <optional-object-name>
For example, to declare a variable named my_cluster_id
with the default value of 1234-567890-abcde123
, and a variable named my_notebook_path
with the default value of ./hello.py
:
variables:
my_cluster_id:
description: The ID of an existing cluster.
default: 1234-567890-abcde123
my_notebook_path:
description: The path to an existing notebook.
default: ./hello.py
If you do not provide a default
value for a variable as part of this declaration, you must provide the value later at the command line, through an environment variable, or elsewhere within your bundle configuration files. These approaches are described later in this section.
Note
Whichever approach you choose to provide variable values, use the same approach during both the deployment and run stages. Otherwise, you might get unexpected results between the time of a deployment and a job or pipeline run that is based on that existing deployment.
To reference your custom variables within your bundle configuration files, use substitutions. For variables, use the format ${var.<variable_name>}
. For example, to reference variables named my_cluster_id
and my_notebook_path
:
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: ${var.my_cluster_id}
notebook_task:
notebook_path: ${var.my_notebook_path}
Set a variable’s value
If you have not provided a default
value for a variable, or if you want to temporarily override the default
value for a variable, provide the variable’s new temporary value using one of the following approaches:
Provide the variable’s value as part of a
bundle
command such asvalidate
,deploy
, orrun
. To do this, use the option--var="<key>=<value>"
, where<key>
is the variable’s name, and<value>
is the variable’s value. For example, as part of thebundle validate
command, to provide the value of1234-567890-abcde123
to the variable namedmy_cluster_id
, and to provide the value of./hello.py
to the variable namedmy_notebook_path
, run:databricks bundle validate --var="my_cluster_id=1234-567890-abcde123,my_notebook_path=./hello.py" # Or: databricks bundle validate --var="my_cluster_id=1234-567890-abcde123" --var="my_notebook_path=./hello.py"
Provide the variable’s value by setting an environment variable. The environment variable’s name must start with
BUNDLE_VAR_
. To set environment variables, see your operating system’s documentation. For example, to provide the value of1234-567890-abcde123
to the variable namedmy_cluster_id
, and to provide the value of./hello.py
to the variable namedmy_notebook_path
, run the following command before you call abundle
command such asvalidate
,deploy
, orrun
:For Linux and macOS:
export BUNDLE_VAR_my_cluster_id=1234-567890-abcde123 && export BUNDLE_VAR_my_notebook_path=./hello.py
For Windows:
"set BUNDLE_VAR_my_cluster_id=1234-567890-abcde123" && "set BUNDLE_VAR_my_notebook_path=./hello.py"
Or, provide the variable’s value as part of a
bundle
command such asvalidate
,deploy
, orrun
, for example for Linux and macOS:BUNDLE_VAR_my_cluster_id=1234-567890-abcde123 BUNDLE_VAR_my_notebook_path=./hello.py databricks bundle validate
Or for Windows:
"set BUNDLE_VAR_my_cluster_id=1234-567890-abcde123" && "set BUNDLE_VAR_my_notebook_path=./hello.py" && "databricks bundle validate"
Provide the variable’s value within your bundle configuration files. To do this, use a
variables
mapping within thetargets
mapping, following this format:variables: <variable-name>: <value>
For example, to provide values for the variables named
my_cluster_id
andmy_notebook_path
for two separate targets:targets: dev: variables: my_cluster_id: 1234-567890-abcde123 my_notebook_path: ./hello.py prod: variables: my_cluster_id: 2345-678901-bcdef234 my_notebook_path: ./hello.py
In the preceding examples, the Databricks CLI looks for values for the variables my_cluster_id
and my_notebook_path
in the following order, stopping when it finds a value for each matching variable, skipping any other locations for that variable:
Within any
--var
options specified as part of thebundle
command.Within any environment variables set that begin with
BUNDLE_VAR_
.Within any
variables
mappings, among thetargets
mappings within your bundle configuration files.Any
default
value for that variable’s definition, among the top-levelvariables
mappings within your bundle configuration files.
Retrieve an object’s ID value
For the alert
, cluster_policy
, cluster
, dashboard
, instance_pool
, job
, metastore
, pipeline
, query
, service_principal
, and warehouse
object types, you can define a lookup
for your custom variable to retrieve a named object’s ID using this format:
variables:
<variable-name>:
lookup:
<object-type>: "<object-name>"
If a lookup is defined for a variable, the ID of the object with the specified name is used as the value of the variable. This ensures the correct resolved ID of the object is always used for the variable.
Note
An error occurs if an object with the specified name does not exist, or if there is more than one object with the specified name.
For example, in the following configuration, ${var.my_cluster_id}
will be replaced by the ID of the 12.2 shared cluster.
variables:
my_cluster_id:
description: An existing cluster
lookup:
cluster: "12.2 shared"
resources:
jobs:
my_job:
name: "My Job"
tasks:
- task_key: TestTask
existing_cluster_id: ${var.my_cluster_id}
Substitutions
You can use substitutions to make your bundle configuration files more modular and reusable.
Tip
You can also use dynamic value references for job parameter values to pass context about a job run to job tasks. See Pass context about job runs into job tasks.
For example, when you run the bundle validate
command, you might see a graph like this (the ellipses indicate omitted content, for brevity):
{
"bundle": {
"name": "hello-bundle",
"target": "dev",
"...": "..."
},
"workspace": {
"...": "...",
"current_user": {
"...": "...",
"userName": "someone@example.com",
"...": "...",
},
"...": "..."
},
"...": {
"...": "..."
}
}
In the preceding example, you could refer to the value someone@example.com
in your bundle configuration file with the substitution ${workspace.current_user.userName}
.
Similarly, the following substitutions:
/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}
In a bundle configuration file such as the following (the ellipsis indicates omitted content, for brevity):
bundle:
name: hello-bundle
workspace:
root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}
# ...
targets:
dev:
default: true
Would resolve to the following graph when you run the bundle validate
command (the ellipses indicate omitted content, for brevity):
{
"bundle": {
"name": "hello-bundle",
"target": "dev",
"...": "..."
},
"workspace": {
"profile": "DEFAULT",
"current_user": {
"...": "...",
"userName": "someone@example.com",
"...": "...",
},
"root": "/Users/someone@example.com/.bundle/hello-bundle/my-envs/dev",
"...": "..."
},
"...": {
"...": "..."
}
}
To determine valid substitutions, you can use the schema hierarchy documented in the REST API reference or you can use the output of the bundle validate
command. For example, based on this section of the output schema, ${resources.pipelines.my_pipeline.target}
is the substitution for the value (in this case, hellobundle_dev) of the target of my_pipeline
:
{
"...": {
"...": "..."
}
"resources": {
"...": "...",
"pipelines": {
"my_pipeline": {
"...": "..."
"target": "hellobundle_dev"
"...": "..."
}
}
}
}
Here are some commonly used substitutions:
${bundle.name}
${bundle.target} # Use this substitution instead of ${bundle.environment}
${workspace.host}
${workspace.current_user.short_name}
${workspace.current_user.userName}
${workspace.file_path}
${workspace.root_path}
${resources.jobs.<job-name>.id}
${resources.models.<model-name>.name}
${resources.pipelines.<pipeline-name>.name}