Skip to main content

Override with target settings

This page describes how to override or join top-level settings with target settings in Databricks Asset Bundles. For information about bundle settings, see Databricks Asset Bundle configuration.

Artifact settings override

You can override the artifact settings in a top-level artifacts mapping with the artifact settings in a targets mapping, for example:

YAML
# ...
artifacts:
<some-unique-programmatic-identifier-for-this-artifact>:
# Artifact settings.

targets:
<some-unique-programmatic-identifier-for-this-target>:
artifacts:
<the-matching-programmatic-identifier-for-this-artifact>:
# Any more artifact settings to join with the settings from the
# matching top-level artifacts mapping.

If any artifact setting is defined both in the top-level artifacts mapping and the targets mapping for the same artifact, then the setting in the targets mapping takes precedence over the setting in the top-level artifacts mapping.

Example 1: Artifact settings defined only in the top-level artifacts mapping

To demonstrate how this works in practice, in the following example, path is defined in the top-level artifacts mapping, which defines all of the settings for the artifact:

YAML
# ...
artifacts:
my-artifact:
type: whl
path: ./my_package
# ...

When you run databricks bundle validate for this example, the resulting graph is:

JSON
{
"...": "...",
"artifacts": {
"my-artifact": {
"type": "whl",
"path": "./my_package",
"...": "..."
}
},
"...": "..."
}

Example 2: Conflicting artifact settings defined in multiple artifact mappings

In this example, path is defined both in the top-level artifacts mapping and in the artifacts mapping in targets. In this example, path in the artifacts mapping in targets takes precedence over path in the top-level artifacts mapping, to define the settings for the artifact:

YAML
# ...
artifacts:
my-artifact:
type: whl
path: ./my_package

targets:
dev:
artifacts:
my-artifact:
path: ./my_other_package
# ...

When you run databricks bundle validate for this example, the resulting graph is:

JSON
{
"...": "...",
"artifacts": {
"my-artifact": {
"type": "whl",
"path": "./my_other_package",
"...": "..."
}
},
"...": "..."
}

Cluster settings overrides

You can override or join the job or pipeline cluster settings for a target.

For jobs, use job_cluster_key within a job definition to identify job cluster settings in the top-level resources mapping to join with job cluster settings in a targets mapping:

YAML
# ...
resources:
jobs:
<some-unique-programmatic-identifier-for-this-job>:
# ...
job_clusters:
- job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
new_cluster:
# Cluster settings.

targets:
<some-unique-programmatic-identifier-for-this-target>:
resources:
jobs:
<the-matching-programmatic-identifier-for-this-job>:
# ...
job_clusters:
- job_cluster_key: <the-matching-programmatic-identifier-for-this-key>
# Any more cluster settings to join with the settings from the
# resources mapping for the matching top-level job_cluster_key.
# ...

If any cluster setting is defined both in the top-level resources mapping and the targets mapping for the same job_cluster_key, then the setting in the targets mapping takes precedence over the setting in the top-level resources mapping.

For Lakeflow Declarative Pipelines, use label within the cluster settings of a pipeline definition to identify cluster settings in a top-level resources mapping to join with the cluster settings in a targets mapping, for example:

YAML
# ...
resources:
pipelines:
<some-unique-programmatic-identifier-for-this-pipeline>:
# ...
clusters:
- label: default | maintenance
# Cluster settings.

targets:
<some-unique-programmatic-identifier-for-this-target>:
resources:
pipelines:
<the-matching-programmatic-identifier-for-this-pipeline>:
# ...
clusters:
- label: default | maintenance
# Any more cluster settings to join with the settings from the
# resources mapping for the matching top-level label.
# ...

If any cluster setting is defined both in the top-level resources mapping and the targets mapping for the same label, then the setting in the targets mapping takes precedence over the setting in the top-level resources mapping.

Example 1: New job cluster settings defined in multiple resource mappings and with no settings conflicts

In this example, spark_version in the top-level resources mapping is combined with node_type_id and num_workers in the resources mapping in targets to define the settings for the job_cluster_key named my-cluster:

YAML
# ...
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
spark_version: 13.3.x-scala2.12

targets:
development:
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
node_type_id: n2-highmem-4
num_workers: 1
# ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

JSON
{
"...": "...",
"resources": {
"jobs": {
"my-job": {
"job_clusters": [
{
"job_cluster_key": "my-cluster",
"new_cluster": {
"node_type_id": "n2-highmem-4",
"num_workers": 1,
"spark_version": "13.3.x-scala2.12"
}
}
],
"...": "..."
}
}
}
}

Example 2: Conflicting new job cluster settings defined in multiple resource mappings

In this example, spark_version, and num_workers are defined both in the top-level resources mapping and in the resources mapping in targets. In this example, spark_version and num_workers in the resources mapping in targets take precedence over spark_version and num_workers in the top-level resources mapping, to define the settings for the job_cluster_key named my-cluster:

YAML
# ...
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: n2-highmem-4
num_workers: 1

targets:
development:
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
spark_version: 12.2.x-scala2.12
num_workers: 2
# ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

JSON
{
"...": "...",
"resources": {
"jobs": {
"my-job": {
"job_clusters": [
{
"job_cluster_key": "my-cluster",
"new_cluster": {
"node_type_id": "n2-highmem-4",
"num_workers": 2,
"spark_version": "12.2.x-scala2.12"
}
}
],
"...": "..."
}
}
}
}

Example 3: Pipeline cluster settings defined in multiple resource mappings and with no settings conflicts

In this example, node_type_id in the top-level resources mapping is combined with num_workers in the resources mapping in targets to define the settings for the label named default:

YAML
# ...
resources:
pipelines:
my-pipeline:
clusters:
- label: default
node_type_id: n2-highmem-4

targets:
development:
resources:
pipelines:
my-pipeline:
clusters:
- label: default
num_workers: 1
# ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

JSON
{
"...": "...",
"resources": {
"pipelines": {
"my-pipeline": {
"clusters": [
{
"label": "default",
"node_type_id": "n2-highmem-4",
"num_workers": 1
}
],
"...": "..."
}
}
}
}

Example 4: Conflicting pipeline cluster settings defined in multiple resource mappings

In this example, num_workers is defined both in the top-level resources mapping and in the resources mapping in targets. num_workers in the resources mapping in targets take precedence over num_workers in the top-level resources mapping, to define the settings for the label named default:

YAML
# ...
resources:
pipelines:
my-pipeline:
clusters:
- label: default
node_type_id: n2-highmem-4
num_workers: 1

targets:
development:
resources:
pipelines:
my-pipeline:
clusters:
- label: default
num_workers: 2
# ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

JSON
{
"...": "...",
"resources": {
"pipelines": {
"my-pipeline": {
"clusters": [
{
"label": "default",
"node_type_id": "n2-highmem-4",
"num_workers": 2
}
],
"...": "..."
}
}
}
}

Job task settings override

You can use the tasks mapping within a job definition to join the job tasks settings in a top-level resources mapping with the job task settings in a targets mapping, for example:

YAML
# ...
resources:
jobs:
<some-unique-programmatic-identifier-for-this-job>:
# ...
tasks:
- task_key: <some-unique-programmatic-identifier-for-this-task>
# Task settings.

targets:
<some-unique-programmatic-identifier-for-this-target>:
resources:
jobs:
<the-matching-programmatic-identifier-for-this-job>:
# ...
tasks:
- task_key: <the-matching-programmatic-identifier-for-this-key>
# Any more task settings to join with the settings from the
# resources mapping for the matching top-level task_key.
# ...

To join the top-level resources mapping and the targets mapping for the same task, the task mappings' task_key must be set to the same value.

If any job task setting is defined both in the top-level resources mapping and the targets mapping for the same task, then the setting in the targets mapping takes precedence over the setting in the top-level resources mapping.

Example 1: Job task settings defined in multiple resource mappings and with no settings conflicts

In this example, spark_version in the top-level resources mapping is combined with node_type_id and num_workers in the resources mapping in targets to define the settings for the task_key named my-task:

YAML
# ...
resources:
jobs:
my-job:
name: my-job
tasks:
- task_key: my-key
new_cluster:
spark_version: 13.3.x-scala2.12

targets:
development:
resources:
jobs:
my-job:
name: my-job
tasks:
- task_key: my-task
new_cluster:
node_type_id: n2-highmem-4
num_workers: 1
# ...

When you run databricks bundle validate for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):

JSON
{
"...": "...",
"resources": {
"jobs": {
"my-job": {
"tasks": [
{
"new_cluster": {
"node_type_id": "n2-highmem-4",
"num_workers": 1,
"spark_version": "13.3.x-scala2.12"
},
"task-key": "my-task"
}
],
"...": "..."
}
}
}
}

Example 2: Conflicting job task settings defined in multiple resource mappings

In this example, spark_version, and num_workers are defined both in the top-level resources mapping and in the resources mapping in targets. spark_version and num_workers in the resources mapping in targets take precedence over spark_version and num_workers in the top-level resources mapping. This defines the settings for the task_key named my-task (ellipses indicate omitted content, for brevity):

YAML
# ...
resources:
jobs:
my-job:
name: my-job
tasks:
- task_key: my-task
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: n2-highmem-4
num_workers: 1

targets:
development:
resources:
jobs:
my-job:
name: my-job
tasks:
- task_key: my-task
new_cluster:
spark_version: 12.2.x-scala2.12
num_workers: 2
# ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

JSON
{
"...": "...",
"resources": {
"jobs": {
"my-job": {
"tasks": [
{
"new_cluster": {
"node_type_id": "n2-highmem-4",
"num_workers": 2,
"spark_version": "12.2.x-scala2.12"
},
"task_key": "my-task"
}
],
"...": "..."
}
}
}
}