Override new job cluster settings in Databricks asset bundles
Preview
This feature is in Public Preview.
This article describes how to override the settings for new Databricks job clusters in Databricks asset bundles. See What are Databricks asset bundles?
In Databricks bundle settings files, you can use the job_cluster_key
mapping within a job definition to join the new job cluster settings in a top-level resources
mapping with the new job cluster settings in an targets
mapping, for example (ellipses indicate omitted content, for brevity):
# ...
resources:
jobs:
<some-unique-programmatic-identifier-for-this-job>:
# ...
job_clusters:
- job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
new_cluster:
# Cluster settings.
targets:
<some-unique-programmatic-identifier-for-this-target>:
resources:
jobs:
<the-matching-programmatic-identifier-for-this-job>:
# ...
job_clusters:
- job_cluster_key: <the-matching-programmatic-identifier-for-this-key>
# Any more cluster settings to join with the settings from the
# resources mapping for the matching top-level job_cluster_key.
# ...
If any new job cluster setting is defined both in the top-level resources
mapping and the targets
mapping for the same job_cluster_key
, then the setting in the targets
mapping takes precedence over the setting in the top-level resources
mapping.
Example 1: New job cluster settings defined in multiple resource mappings and with no settings conflicts
In this example, spark_version
in the top-level resources
mapping is combined with node_type_id
and num_workers
in the resources
mapping in targets
to define the settings for the job_cluster_key
named my-cluster
(ellipses indicate omitted content, for brevity):
# ...
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
spark_version: 13.3.x-scala2.12
targets:
development:
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
node_type_id: i3.xlarge
num_workers: 1
# ...
When you run databricks bundle validate
for this example, the resulting graph is (ellipses indicate omitted content, for brevity):
{
"...": "...",
"resources": {
"jobs": {
"my-job": {
"job_clusters": [
{
"job_cluster_key": "my-cluster",
"new_cluster": {
"node_type_id": "i3.xlarge",
"num_workers": 1,
"spark_version": "13.3.x-scala2.12"
}
}
],
"...": "..."
}
}
}
}
Example 2: Conflicting new job cluster settings defined in multiple resource mappings
In this example, spark_version
, and num_workers
are defined both in the top-level resources
mapping and in the resources
mapping in targets
. In this example, spark_version
and num_workers
in the resources
mapping in targets
take precedence over spark_version
and num_workers
in the top-level resources
mapping, to define the settings for the job_cluster_key
named my-cluster
(ellipses indicate omitted content, for brevity):
# ...
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: i3.xlarge
num_workers: 1
targets:
development:
resources:
jobs:
my-job:
name: my-job
job_clusters:
- job_cluster_key: my-cluster
new_cluster:
spark_version: 12.2.x-scala2.12
num_workers: 2
# ...
When you run databricks bundle validate
for this example, the resulting graph is (ellipses indicate omitted content, for brevity):
{
"...": "...",
"resources": {
"jobs": {
"my-job": {
"job_clusters": [
{
"job_cluster_key": "my-cluster",
"new_cluster": {
"node_type_id": "i3.xlarge",
"num_workers": 2,
"spark_version": "12.2.x-scala2.12"
}
}
],
"...": "..."
}
}
}
}