Override cluster settings in Databricks Asset Bundles
This article describes how to override the settings for Databricks clusters in Databricks Asset Bundles. See What are Databricks Asset Bundles?
In Databricks bundle configuration files, you can join the cluster settings in a top-level resources
mapping with the cluster settings in a targets
mapping, as follows.
For jobs, use the job_cluster_key
mapping within a job definition to join the cluster settings in a top-level resources
mapping with the cluster settings in a targets
mapping, for example (ellipses indicate omitted content, for brevity):
# ...
resources:
jobs:
<some-unique-programmatic-identifier-for-this-job>:
# ...
job_clusters:
- job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
new_cluster:
# Cluster settings.
targets:
<some-unique-programmatic-identifier-for-this-target>:
resources:
jobs:
<the-matching-programmatic-identifier-for-this-job>:
# ...
job_clusters:
- job_cluster_key: <the-matching-programmatic-identifier-for-this-key>
# Any more cluster settings to join with the settings from the
# resources mapping for the matching top-level job_cluster_key.
# ...
If any cluster setting is defined both in the top-level resources
mapping and the targets
mapping for the same job_cluster_key
, then the setting in the targets
mapping takes precedence over the setting in the top-level resources
mapping.
For Delta Live Tables pipelines, use the label
mapping within the cluster
of a pipeline definition to join the cluster settings in a top-level resources
mapping with the cluster settings in a targets
mapping, for example (ellipses indicate omitted content, for brevity):
# ...
resources:
pipelines:
<some-unique-programmatic-identifier-for-this-pipeline>:
# ...
clusters:
- label: default | maintenance
# Cluster settings.
targets:
<some-unique-programmatic-identifier-for-this-target>:
resources:
pipelines:
<the-matching-programmatic-identifier-for-this-pipeline>:
# ...
clusters:
- label: default | maintenance
# Any more cluster settings to join with the settings from the
# resources mapping for the matching top-level label.
# ...
If any cluster setting is defined both in the top-level resources
mapping and the targets
mapping for the same label
, then the setting in the targets
mapping takes precedence over the setting in the top-level resources
mapping.
Example 1: New job cluster settings defined in multiple resource mappings and with no settings conflicts
In this example, spark_version
in the top-level resources
mapping is combined with node_type_id
and num_workers
in the resources
mapping in targets
to define the settings for the job_cluster_key
named my-cluster
(ellipses indicate omitted content, for brevity):
# ...
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
spark_version: 13.3.x-scala2.12
targets:
development:
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
node_type_id: i3.xlarge
num_workers: 1
# ...
When you run databricks bundle validate
for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):
{
"...": "...",
"resources": {
"jobs": {
"my-job": {
"job_clusters": [
{
"job_cluster_key": "my-cluster",
"new_cluster": {
"node_type_id": "i3.xlarge",
"num_workers": 1,
"spark_version": "13.3.x-scala2.12"
}
}
],
"...": "..."
}
}
}
}
Example 2: Conflicting new job cluster settings defined in multiple resource mappings
In this example, spark_version
, and num_workers
are defined both in the top-level resources
mapping and in the resources
mapping in targets
. In this example, spark_version
and num_workers
in the resources
mapping in targets
take precedence over spark_version
and num_workers
in the top-level resources
mapping, to define the settings for the job_cluster_key
named my-cluster
(ellipses indicate omitted content, for brevity):
# ...
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: i3.xlarge
num_workers: 1
targets:
development:
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
spark_version: 12.2.x-scala2.12
num_workers: 2
# ...
When you run databricks bundle validate
for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):
{
"...": "...",
"resources": {
"jobs": {
"my-job": {
"job_clusters": [
{
"job_cluster_key": "my-cluster",
"new_cluster": {
"node_type_id": "i3.xlarge",
"num_workers": 2,
"spark_version": "12.2.x-scala2.12"
}
}
],
"...": "..."
}
}
}
}
Example 3: Pipeline cluster settings defined in multiple resource mappings and with no settings conflicts
In this example, node_type_id
in the top-level resources
mapping is combined with num_workers
in the resources
mapping in targets
to define the settings for the label
named default
(ellipses indicate omitted content, for brevity):
# ...
resources:
pipelines:
my-pipeline:
clusters:
- label: default
node_type_id: i3.xlarge
targets:
development:
resources:
pipelines:
my-pipeline:
clusters:
- label: default
num_workers: 1
# ...
When you run databricks bundle validate
for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):
{
"...": "...",
"resources": {
"pipelines": {
"my-pipeline": {
"clusters": [
{
"label": "default",
"node_type_id": "i3.xlarge",
"num_workers": 1
}
],
"...": "..."
}
}
}
}
Example 4: Conflicting pipeline cluster settings defined in multiple resource mappings
In this example, num_workers
is defined both in the top-level resources
mapping and in the resources
mapping in targets
. num_workers
in the resources
mapping in targets
take precedence over num_workers
in the top-level resources
mapping, to define the settings for the label
named default
(ellipses indicate omitted content, for brevity):
# ...
resources:
pipelines:
my-pipeline:
clusters:
- label: default
node_type_id: i3.xlarge
num_workers: 1
targets:
development:
resources:
pipelines:
my-pipeline:
clusters:
- label: default
num_workers: 2
# ...
When you run databricks bundle validate
for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):
{
"...": "...",
"resources": {
"pipelines": {
"my-pipeline": {
"clusters": [
{
"label": "default",
"node_type_id": "i3.xlarge",
"num_workers": 2
}
],
"...": "..."
}
}
}
}