Databricks Asset Bundle configurations

Preview

This feature is in Public Preview.

This article describes the syntax for Databricks Asset Bundle configuration files, which define Databricks Asset Bundles. See What are Databricks Asset Bundles?

A bundle configuration file must be expressed in YAML format and must contain at minimum the top-level bundle mapping. Each bundle must contain at minimum one (and only one) bundle configuration file named databricks.yml. If there are multiple bundle configuration files, they must be referenced by the databricks.yml file.

For more information about YAML, see the official YAML specification and tutorial.

To create and work with bundle configuration files, see Databricks Asset Bundles development workflow.

Overview

This section provides a visual representation of the bundle configuration file schema. For details, see Mappings.

# These is the default bundle configuration if not otherwise overridden in
# the "targets" top-level mapping.
bundle: # Required.
  name: string # Required.
  compute_id: string
  git:
    origin_url: string
    branch: string

# These are for any custom variables for use throughout the bundle.
variables:
  <some-unique-variable-name>:
    description: string
    default: string

# These are the default workspace settings if not otherwise overridden in
# the following "targets" top-level mapping.
workspace:
  artifact_path: string
  auth_type: string
  azure_client_id: string # For Azure Databricks only.
  azure_environment: string # For Azure Databricks only.
  azure_login_app_id: string # For Azure Databricks only. Non-operational and reserved for future use.
  azure_tenant_id: string # For Azure Databricks only.
  azure_use_msi: true | false # For Azure Databricks only.
  azure_workspace_resource_id: string # For Azure Databricks only.
  client_id: string # For Databricks on AWS only.
  file_path: string
  google_service_account: string # For Databricks on Google Cloud only.
  host: string
  profile: string
  root_path: string
  state_path: string

# These are the permissions to apply to experiments, jobs, models, and pipelines defined
# in the "resources" mapping.
permissions:
  - level: <permission-level>
    group_name: <unique-group-name>
  - level: <permission-level>
    user_name: <unique-user-name>
  - level: <permission-level>
    service_principal_name: <unique-principal-name>

# These are the default artifact settings if not otherwise overridden in
# the following "targets" top-level mapping.
artifacts:
  <some-unique-artifact-identifier>:
    build: string
    files:
      - source: string
    path: string
    type: string

# These are any additional configuration files to include.
include:
  - "<some-file-or-path-glob-to-include>"
  - "<another-file-or-path-glob-to-include>"

# This is the identity to use to run the bundle
run_as:
  - user_name: <user-name>
  - service_principal_name: <service-principal-name>

# These are the default job and pipeline settings if not otherwise overridden in
# the following "targets" top-level mapping.
resources:
  experiments:
    <some-unique-programmatic-identifier-for-this-experiment>:
      # See the Experiments API's create experiment request payload reference.
  jobs:
    <some-unique-programmatic-identifier-for-this-job>:
      # See the Jobs API's create job request payload reference.
  models:
    <some-unique-programmatic-identifier-for-this-model>:
      # See the Models API's create model request payload reference.
  pipelines:
    <some-unique-programmatic-identifier-for-this-pipeline>:
      # See the Delta Live Tables API's create pipeline request payload reference.

# These are any additional files or paths to include or exclude.
sync:
  include:
    - "<some-file-or-path-glob-to-include>"
    - "<another-file-or-path-glob-to-include>"
  exclude:
    - "<some-file-or-path-glob-to-exclude>"
    - "<another-file-or-path-glob-to-exclude>"

# These are the targets to use for deployments and workflow runs. One and only one of these
# targets can be set to "default: true".
targets:
  <some-unique-programmatic-identifier-for-this-target>:
    artifacts:
      # See the preceding "artifacts" syntax.
    bundle:
      # See the preceding "bundle" syntax.
    compute_id: string
    default: true | false
    mode: development
    resources:
      # See the preceding "resources" syntax.
    sync:
      # See the preceding "sync" syntax.
    variables:
      <preceding-unique-variable-name>: <non-default-value>
    workspace:
      # See the preceding "workspace" syntax.
    run_as:
      # See the preceding "run_as" syntax.

Examples

Following is an example bundle configuration file. This bundle specifies the remote deployment of a local file named hello.py that is in the same directory as this local bundle configuration file named databricks.yml. It runs this notebook as a job by using the remote cluster with the specified cluster ID. The remote workspace URL and workspace authentication credentials are read from the caller’s local configuration profile named DEFAULT.

Note

Databricks recommends that you use the host mapping instead of the default mapping wherever possible, as this makes your bundle configuration files more portable. Setting the host mapping instructs the Databricks CLI to find a matching profile in your .databrickscfg file and then use that profile’s fields to determine which Databricks authentication type to use. If multiple profiles with a matching host field exist within your .databrickscfg file, then you must use the profile to instruct the Databricks CLI about which specific profile to use. For an example, see the prod target declaration later in this section.

This technique enables you to reuse as well as to override the job definitions and settings within the resources block:

bundle:
  name: hello-bundle

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

targets:
  dev:
    default: true

While the following bundle configuration file is functionally equivalent, it is not modularized, which does not enable good reuse. Also, this declaration appends a task to the job rather the overriding the existing job:

bundle:
  name: hello-bundle

targets:
  dev:
    default: true
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 1234-567890-abcde123
              notebook_task:
                notebook_path: ./hello.py

Following is the previous modularized example but with the addition of a target with the programmatic (or logical) name prod that uses a different remote workspace URL and workspace authentication credentials, which are read from the caller’s .databrickscfg file’s matching host entry with the specified workspace URL. This job runs the same notebook but uses a different remote cluster with the specified cluster ID. Notice that you do not need to declare the notebook_task mapping within the prod mapping, as it falls back to use the notebook_task mapping within the top-level resources mapping, if the notebook_task mapping is not explicitly overridden within the prod mapping.

bundle:
  name: hello-bundle

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 2345-678901-fabcd456

To validate, deploy, and run this job within the dev target, run the following commands:

# Because the "dev" target is set to "default: true",
# you do not need to specify "-t dev":
databricks bundle validate
databricks bundle deploy
databricks bundle run hello_job

# But you can still explicitly specify it, if you want or need to:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev hello_job

To validate, deploy, and run this job within the prod target instead, run the following commands:

# You must specify "-t prod", because the "dev" target
# is already set to "default: true":
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod hello_job

Following is the previous example but split up into component files for even more modularization and better reuse across multiple bundle configuration files. This technique enables you to not only reuse various definitions and settings, but you can also swap out any of these files with other files that provide completely different declarations:

databricks.yml:

bundle:
  name: hello-bundle

include:
  - "bundle*.yml"

bundle.resources.yml:

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

bundle.targets.yml:

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 2345-678901-fabcd456

For more examples, see the bundle examples repository in GitHub.

Mappings

The following sections describe the bundle configuration file syntax, by top-level mapping.

bundle

A bundle configuration file must contain only one top-level bundle mapping that associates the bundle’s contents and Databricks workspace settings.

This bundle mapping must contain a name mapping that specifies a programmatic (or logical) name for the bundle. The following example declares a bundle with the programmatic (or logical) name hello-bundle.

bundle:
  name: hello-bundle

The bundle mapping can have a child compute_id mapping. This mapping enables you to specify the ID of a cluster to use as an override for any and all clusters defined elsewhere in the bundle configuration file. This override is intended for development-only scenarios prior to production. The compute_id mapping works only for the target that has its mode mapping set to development. For more information about the compute_id mapping, see the targets mapping.

The bundle configuration file can also contain a top-level git mapping.

A bundle mapping can also be a child of one or more of the targets in the top-level targets mapping. Each of these child bundle mappings specify any non-default overrides at the target level. However, the top-level bundle mapping’s name value cannot be overridden at the target level.

variables

The bundles settings file can contain one top-level variables mapping to specify variable settings to use. See Custom variables.

workspace

The bundle configuration file can contain only one top-level workspace mapping to specify any non-default Databricks workspace settings to use.

This workspace mapping can contain a root_path mapping to specify a non-default root path to use within the workspace for both deployments and workflow runs, for example:

workspace:
  root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}

By default, for root_path the Databricks CLI uses the default path of /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}, which uses substitutions.

This workspace mapping can also contain an artifact_path mapping to specify a non-default artifact path to use within the workspace for both deployments and workflow runs, for example:

workspace:
  artifact_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}/artifacts

By default, for artifact_path the Databricks CLI uses the default path of ${workspace.root}/artifacts, which uses substitutions.

..note:: The artifact_path mapping does not support Databricks File System (DBFS) paths.

This workspace mapping can also contain a file_path mapping to specify a non-default file path to use within the workspace for both deployments and workflow runs, for example:

workspace:
  file_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}/files

By default, for file_path the Databricks CLI uses the default path of ${workspace.root}/files, which uses substitutions.

The state_path mapping defaults to the default path of ${workspace.root}/state and represents the path within your workspace to store Terraform state information about deployments.

The workspace mapping can also contain the following optional mappings to specify the Databricks authentication mechanism to use. If they are not specified within this workspace mapping, they must be specified in a workspace mapping as a child of one or more of the targets in the top-level targets mapping.

Important

You must hard-code values for the following workspace mappings for Databricks authentication. For instance, you cannot specify custom variables for these mappings’ values by using the ${var.*} syntax.

  • The profile mapping, (or the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI) specifies the name of a configuration profile to use with this workspace for Databricks authentication. This configuration profile maps to the one that you created when you set up the Databricks CLI.

    Note

    Databricks recommends that you use the host mapping (or the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI) instead of the profile mapping, as this makes your bundle configuration files more portable. Setting the host mapping instructs the Databricks CLI to find a matching profile in your .databrickscfg file and then use that profile’s fields to determine which Databricks authentication type to use. If multiple profiles with a matching host field exist within your .databrickscfg file, then you must use the profile mapping (or the --profile or -p command-line options) to instruct the Databricks CLI about which profile to use. For an example, see the prod target declaration in the examples.

  • The host mapping specifies the URL for your Databricks workspace. See Workspace instance names, URLs, and IDs.

  • For OAuth machine-to-machine (M2M) authentication, the mapping client_id is used. Alternatively, you can set this value in the local environment variable DATABRICKS_CLIENT_ID. Or you can create a configuration profile with the client_id value and then specify the profile’s name with the profile mapping (or by using the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See OAuth machine-to-machine (M2M) authentication.

    Note

    You cannot specify a client secret value in the bundle configuration file. Instead, set the local environment variable DATABRICKS_CLIENT_SECRET. Or you can add the client_secret value to a configuration profile and then specify the profile’s name with the profile mapping (or by using the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI).

  • The auth_type mapping specifies the Databricks authentication type to use, especially in cases where the Databricks CLI infers an unexpected authentication type. See the Authentication type field.

permissions

The top-level permissions mapping specifies one or more permission levels to apply to all resources defined in the bundle. If you want to apply permissions to a specific resource, see Define permissions for a specific resource.

Allowed top-level permission levels are CAN_VIEW, CAN_MANAGE, and CAN_RUN.

The following example in a bundle configuration file defines permission levels for a user, group, and service principal, which are applied to all jobs, pipelines, experiments, and models defined in resources in the bundle:

permissions:
  - level: CAN_VIEW
    group_name: test-group
  - level: CAN_MANAGE
    user_name: someone@example.com
  - level: CAN_RUN
    service_principal_name: 123456-abcdef

artifacts

The top-level artifacts mapping specifies one or more artifacts that are automatically built during bundle deployments and can be used later in bundle runs. Each child artifact supports the following mappings:

  • type is required. To build a Python wheel file before deploying, this mapping must be set to whl.

  • path is an optional, relative path from the location of the bundle configuration file to the location of the Python wheel file’s setup.py file. If path is not included, the Databricks CLI will attempt to find the Python wheel file’s setup.py file in the bundle’s root.

  • files is an optional mapping that includes a child source mapping, which you can use to specify non-default locations to include for complex build instructions. Locations are specified as relative paths from the location of the bundle configuration file.

  • build is an optional set of non-default build commands that you want to run locally before deployment. For Python wheel builds, the Databricks CLI assumes that it can find a local install of the Python wheel package to run builds, and it runs the command python setup.py bdist_wheel by default during each bundle deployment. To specify multiple build commands, separate each command with double-ampersand (&&) characters.

For more information, including a sample bundle that uses artifacts, see Develop a Python wheel file using Databricks Asset Bundles.

Tip

You can define, combine, and override the settings for artifacts in bundles by using the techniques described in Define artifact settings dynamically in Databricks Asset Bundles.

include

The include array specifies a list of path globs that contain configuration files to include within the bundle. These path globs are relative to the location of the bundle configuration file in which the path globs are specified.

The Databricks CLI does not include any configuration files by default within the bundle. You must use the include array to specify any and all configuration files to include within the bundle, other than the databricks.yml file itself.

This include array can appear only as a top-level mapping.

The following example in a bundle configuration file includes the three specified configuration files. These files are in the same directory as the bundle configuration file:

include:
  - "bundle.artifacts.yml"
  - "bundle.resources.yml"
  - "bundle.targets.yml"

The following example in a bundle configuration file includes all files with filenames that begin with bundle and end with .yml. These files are in the same directory as the bundle configuration file:

include:
  - "bundle*.yml"

resources

The resources mapping specifies information about the Databricks resources used by the bundle.

This resources mapping can appear as a top-level mapping, or it can be a child of one or more of the targets in the top-level targets mapping, and includes zero or one of the supported resource types. Each resource type mapping includes one or more individual resource declarations, which must each have a unique name. These individual resource declarations use the create operation’s request payload, expressed in YAML, to define the resource. Create operation request payloads are documented in the Databricks REST API Reference.

The following table lists supported resource types for bundles and links to documentation on their corresponding payloads:

Resource type

Resource mappings

jobs

Job mappings: POST /api/2.1/jobs/create

For additional information, see job task types and override new job cluster settings.

pipelines

Pipeline mappings: POST /api/2.0/pipelines

experiments

Experiment mappings: POST /api/2.0/mlflow/experiments/create

models

Model mappings: POST /api/2.0/mlflow/registered-models/create

model_serving_endpoints

Model serving endpoint mappings: POST /api/2.0/serving-endpoints

registered_models (Unity Catalog)

Unity Catalog Model mappings: POST /api/2.1/unity-catalog/models

All paths to folders and files referenced by resource declarations are relative to the location of the bundle configuration file in which these paths are specified.

The following example declares a job with the resource key of hello-job and a pipeline with the resource key of hello-pipeline:

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py
  pipelines:
    hello-pipeline:
      name: hello-pipeline
      clusters:
        - label: default
          num_workers: 1
      development: true
      continuous: false
      channel: CURRENT
      edition: CORE
      photon: false
      libraries:
        - notebook:
            path: ./pipeline.py

sync

The sync array specifies a list of file or path globs to include within bundle deployments or to exclude from bundle deployments, depending on the following rules:

  • Based on any list of file and path globs in a .gitignore file in the bundle’s root, the include mapping can contain a list of file globs, path globs, or both, relative to the bundle’s root, to explicitly include.

  • Based on any list of file and path globs in a .gitignore file in the bundle’s root, plus the list of file and path globs in the include mapping, the exclude mapping can contain a list of file globs, path globs, or both, relative to the bundle’s root, to explicitly exclude.

All paths to specified folders and files are relative to the location of the bundle configuration file in which these paths are specified.

The syntax for include and exclude file and path patterns follow standard .gitignore pattern syntax. See gitignore Pattern Format.

For example, if the following .gitignore file contains the following entries:

.databricks
my_package/dist

And the bundle configuration file contains the following include mapping:

sync:
  include:
    - my_package/dist/*.whl

Then all of the files in the my_package/dist folder with a file extension of *.whl are included. Any other files in the my_package/dist folder are not included.

However, if the bundle configuration file also contains the following exclude mapping:

sync:
  include:
    - my_package/dist/*.whl
  exclude:
    - my_package/dist/delete-me.whl

Then all of the files in the my_package/dist folder with a file extension of *.whl, except for the file named delete-me.whl, are included. Any other files in the my_package/dist folder are also not included.

The sync array can also be declared in the targets mapping for a specific target. Any sync array declared in a target is merged with any top-level sync array declarations. For example, continuing with the preceding example, the following include mapping at the targets level merges with the include mapping in the top-level sync array:

targets:
  dev:
    sync:
      include:
        - my_package/dist/delete-me.whl

When you run databricks bundle validate, the relevant portion of the resulting graph is as follows:

"sync": {
  "include": [
    "my_package/dist/*.whl",
    "my_package/dist/delete-me.whl"
  ],
  "exclude": [
    "my_package/dist/delete-me.whl"
  ]
}

targets

The targets mapping specifies one or more contexts in which to run Databricks workflows. Each target is a unique collection of artifacts, Databricks workspace settings, and Databricks job or pipeline details.

This targets mapping is optional but highly recommended. If it is specified, it can appear only as a top-level mapping. If the targets mapping is not specified, then the settings in the top-level workspace, artifacts, and resources mappings are always used.

The targets mapping consists of one or more target mappings, which must each have a unique programmatic (or logical) name.

If a target mapping does not specify workspace, artifacts, or resources child mappings, then that target uses the settings in the top-level workspace, artifacts, and resources mappings.

If a target mapping specifies a workspace, artifacts, or resources mapping, and a top-level workspace, artifacts, or resources mapping also exists, then any conflicting settings are overridden by the settings within the target.

A target can also override the values of any top-level variables.

To specify that a target is the default one unless otherwise specified, add the default mapping, set to true. For example, this target named dev is the default target:

targets:
  dev:
    default: true

To specify that a target is treated as a development target, add the mode mapping, set to development. To specify that a target is treated production target, add the mode mapping, set to production. For example, this target named prod is treated as a production target:

targets:
  prod:
    mode: production

Specifying mode provides a collection of corresponding default behaviors for pre-production and production workflows. For details, see Databricks Asset Bundle deployment modes. In addition, you can specify run_as for each target, as described in Specify a run identity for a Databricks Asset Bundles workflow.

The following example declares two targets. The first target has a programmatic (or logical) name of dev and is the default target. The second target has a programmatic (or logical) name of prod and is not the default target. This second target uses a Databricks connection profile named PROD for authentication:

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>

To validate, deploy, and run jobs or pipelines within the dev target, run the following commands:

# Because the "dev" target is set to "default: true",
# you do not need to specify "-t dev":
databricks bundle validate
databricks bundle deploy
databricks bundle run <job-or-pipeline-programmatic-name>

# But you can still explicitly specify it, if you want or need to:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev <job-or-pipeline-programmatic-name>

To validate, deploy, and run this job within the prod target instead, run the following commands:

# You must specify "-t prod", because the "dev" target
# is already set to "default: true":
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod <job-or-pipeline-programmatic-name>

Custom variables

You can use custom variables to make your bundle configuration files more modular and reusable. For example, you might declare a variable that represents the ID of an existing cluster, and then want to change that variable’s value to different cluster IDs for various workflow runs within multiple targets without changing your bundle configuration files’ original code.

You can declare one or more variables in your bundle configuration files within the variables mapping. For each variable, you can set an optional description, default value, or lookup to retrieve an ID value, following this format:

variables:
  <variable-name>:
    description: <optional-description>
    default: <optional-default-value>
    lookup:
      <optional-object-type>: <optional-object-name>

For example, to declare a variable named my_cluster_id with the default value of 1234-567890-abcde123, and a variable named my_notebook_path with the default value of ./hello.py:

variables:
  my_cluster_id:
    description: The ID of an existing cluster.
    default: 1234-567890-abcde123
  my_notebook_path:
    description: The path to an existing notebook.
    default: ./hello.py

If you do not provide a default value for a variable as part of this declaration, you must provide the value later at the command line, through an environment variable, or elsewhere within your bundle configuration files. These approaches are described later in this section.

Note

Whichever approach you choose to provide variable values, use the same approach during both the deployment and run stages. Otherwise, you might get unexpected results between the time of a deployment and a job or pipeline run that is based on that existing deployment.

To reference your custom variables within your bundle configuration files, use substitutions. For variables, use the format ${var.<variable_name>}. For example, to reference variables named my_cluster_id and my_notebook_path:

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: ${var.my_cluster_id}
          notebook_task:
            notebook_path: ${var.my_notebook_path}

Set a variable’s value

If you have not provided a default value for a variable, or if you want to temporarily override the default value for a variable, provide the variable’s new temporary value using one of the following approaches:

  • Provide the variable’s value as part of a bundle command such as validate, deploy, or run. To do this, use the option --var="<key>=<value>", where <key> is the variable’s name, and <value> is the variable’s value. For example, as part of the bundle validate command, to provide the value of 1234-567890-abcde123 to the variable named my_cluster_id, and to provide the value of ./hello.py to the variable named my_notebook_path, run:

    databricks bundle validate --var="my_cluster_id=1234-567890-abcde123,my_notebook_path=./hello.py"
    
    # Or:
    databricks bundle validate --var="my_cluster_id=1234-567890-abcde123" --var="my_notebook_path=./hello.py"
    
  • Provide the variable’s value by setting an environment variable. The environment variable’s name must start with BUNDLE_VAR_. To set environment variables, see your operating system’s documentation. For example, to provide the value of 1234-567890-abcde123 to the variable named my_cluster_id, and to provide the value of ./hello.py to the variable named my_notebook_path, run the following command before you call a bundle command such as validate, deploy, or run:

    For Linux and macOS:

    export BUNDLE_VAR_my_cluster_id=1234-567890-abcde123 && export BUNDLE_VAR_my_notebook_path=./hello.py
    

    For Windows:

    "set BUNDLE_VAR_my_cluster_id=1234-567890-abcde123" && "set BUNDLE_VAR_my_notebook_path=./hello.py"
    

    Or, provide the variable’s value as part of a bundle command such as validate, deploy, or run, for example for Linux and macOS:

    BUNDLE_VAR_my_cluster_id=1234-567890-abcde123 BUNDLE_VAR_my_notebook_path=./hello.py databricks bundle validate
    

    Or for Windows:

    "set BUNDLE_VAR_my_cluster_id=1234-567890-abcde123" && "set BUNDLE_VAR_my_notebook_path=./hello.py" && "databricks bundle validate"
    
  • Provide the variable’s value within your bundle configuration files. To do this, use a variables mapping within the targets mapping, following this format:

    variables:
      <variable-name>: <value>
    

    For example, to provide values for the variables named my_cluster_id and my_notebook_path for two separate targets:

    targets:
      dev:
        variables:
          my_cluster_id: 1234-567890-abcde123
          my_notebook_path: ./hello.py
      prod:
        variables:
          my_cluster_id: 2345-678901-bcdef234
          my_notebook_path: ./hello.py
    

In the preceding examples, the Databricks CLI looks for values for the variables my_cluster_id and my_notebook_path in the following order, stopping when it finds a value for each matching variable, skipping any other locations for that variable:

  1. Within any --var options specified as part of the bundle command.

  2. Within any environment variables set that begin with BUNDLE_VAR_.

  3. Within any variables mappings, among the targets mappings within your bundle configuration files.

  4. Any default value for that variable’s definition, among the top-level variables mappings within your bundle configuration files.

Retrieve an object’s ID value

For the alert, cluster_policy, cluster, dashboard, instance_pool, job, metastore, pipeline, query, service_principal, and warehouse object types, you can define a lookup for your custom variable to retrieve a named object’s ID using this format:

variables:
  <variable-name>:
    lookup:
      <object-type>: "<object-name>"

If a lookup is defined for a variable, the ID of the object with the specified name is used as the value of the variable. This ensures the correct resolved ID of the object is always used for the variable.

Note

An error occurs if an object with the specified name does not exist, or if there is more than one object with the specified name.

For example, in the following configuration, ${var.my_cluster_id} will be replaced by the ID of the 12.2 shared cluster.

variables:
  my_cluster_id:
    description: An existing cluster
    lookup:
      cluster: "12.2 shared"

resources:
  jobs:
    my_job:
      name: "My Job"
      tasks:
        - task_key: TestTask
          existing_cluster_id: ${var.my_cluster_id}

Git settings

You can retrieve and override version control details that are associated with your bundle. This is useful for annotating your deployed resources. For example, you might want to include the origin URL of your repository within the description of a machine learning model that you deploy.

Whenever you run a bundle command such as validate, deploy or run, the bundle command populates the command’s configuration tree with the following default settings:

  • bundle.git.origin_url, which represents the origin URL of the repo. This is the same value that you would get if you ran the command git config --get remote.origin.url from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as ${bundle.git.origin_url}.

  • bundle.git.branch, which represents the current branch within the repo. This is the same value that you would get if you ran the command git branch --show-current from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as ${bundle.git.branch}.

  • bundle.git.commit, which represents the HEAD commit within the repo. This is the same value that you would get if you ran the command git rev-parse HEAD from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as ${bundle.git.commit}.

To retrieve or override Git settings, your bundle must be within a directory that is associated with a Git repository, for example a local directory that is initialized by running the git clone command. If the directory is not associated with a Git repository, these Git settings are empty.

You can override the origin_url and branch settings within the git mapping of your top-level bundle mapping if needed, as follows:

bundle:
  git:
    origin_url: <some-non-default-origin-url>
    branch: <some-non-current-branch-name>

Substitutions

You can use substitutions to make your bundle configuration files more modular and reusable.

Tip

You can also use dynamic value references for job parameter values to pass context about a job run to job tasks. See Pass context about job runs into job tasks.

For example, when you run the bundle validate command, you might see a graph like this (the ellipses indicate omitted content, for brevity):

{
  "bundle": {
    "name": "hello-bundle",
    "target": "dev",
    "...": "..."
  },
  "workspace": {
    "...": "...",
    "current_user": {
      "...": "...",
      "userName": "someone@example.com",
      "...": "...",
    },
    "...": "..."
  },
  "...": {
    "...": "..."
  }
}

In the preceding example, you could refer to the value someone@example.com in your bundle configuration file with the substitution ${workspace.current_user.userName}.

Similarly, the following substitutions:

/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}

In a bundle configuration file such as the following (the ellipsis indicates omitted content, for brevity):

bundle:
  name: hello-bundle

workspace:
  root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}

# ...

targets:
  dev:
    default: true

Would resolve to the following graph when you run the bundle validate command (the ellipses indicate omitted content, for brevity):

{
  "bundle": {
    "name": "hello-bundle",
    "target": "dev",
    "...": "..."
  },
  "workspace": {
    "profile": "DEFAULT",
    "current_user": {
      "...": "...",
      "userName": "someone@example.com",
      "...": "...",
    },
    "root": "/Users/someone@example.com/.bundle/hello-bundle/my-envs/dev",
    "...": "..."
  },
  "...": {
    "...": "..."
  }
}

To determine valid substitutions, you can use the schema hierarchy documented in the REST API reference or you can use the output of the bundle validate command. For example, based on this section of the output schema, ${resources.pipelines.my_pipeline.target} is the substitution for the value (in this case, hellobundle_dev) of the target of my_pipeline:

{
  "...": {
    "...": "..."
  }
  "resources": {
    "...": "...",
    "pipelines": {
      "my_pipeline": {
        "...": "..."
        "target": "hellobundle_dev"
        "...": "..."
      }
    }
  }
}

Here are some commonly used substitutions:

  • ${bundle.name}

  • ${bundle.target}  # Use this substitution instead of ${bundle.environment}

  • ${workspace.host}

  • ${workspace.current_user.short_name}

  • ${workspace.current_user.userName}

  • ${workspace.file_path}

  • ${workspace.root_path}

  • ${resources.jobs.<job-name>.id}

  • ${resources.models.<model-name>.name}

  • ${resources.pipelines.<pipeline-name>.name}