Cluster Size and Autoscaling

When you create a cluster, you can either provide a fixed number of workers for the cluster or provide a minimum and maximum number of workers for the cluster.

When you provide a fixed size cluster, Databricks ensures that your cluster has the specified number of workers. When you provide a range for the number of workers, Databricks chooses the appropriate number of workers required to run your job. This is referred to as autoscaling.

With autoscaling, Databricks dynamically reallocates workers to account for the characteristics of your job. Certain parts of your pipeline may be more computationally demanding than others, and Databricks automatically adds additional workers during these phases of your job (and removes them when they’re no longer needed).

Autoscaling makes it easier to achieve high cluster utilization, because you don’t need to provision the cluster to match a workload. This applies especially to workloads whose requirements change over time (like exploring a dataset during the course of a day), but it can also apply to a one-time shorter workload whose provisioning requirements are unknown. Autoscaling thus offers two advantages:

  • Workloads can run faster compared to a constant-sized under-provisioned cluster.
  • Autoscaling clusters can reduce overall costs compared to a statically-sized cluster.

Depending on the constant size of the cluster and the workload, autoscaling gives you one or both of these benefits at the same time. The cluster size can go below the minimum number of workers selected when the cloud provider terminates instances. In this case, Databricks continuously retries to re-provision instances in order to maintain the minimum number of workers.

Note

  • Autoscaling works best with Databricks Runtime 3.4 and above.
  • Autoscaling is not available for spark-submit jobs.

Autoscaling types

Databricks offers two types of cluster node autoscaling: standard and optimized. For a discussion of the benefits of optimized autoscaling, see the blog post on Optimized Autoscaling.

Job clusters always use optimized autoscaling. The type of autoscaling performed on interactive clusters depends on the workspace configuration.

Standard autoscaling is used by default on interactive clusters. Optimized autoscaling for interactive clusters is available on request. Contact your Databricks account manager to enable this feature.

How autoscaling behaves

Autoscaling behaves differently depending on whether it is optimized or standard and whether applied to an interactive or a job cluster.

Optimized

  • Scales up from min to max in 2 steps.
  • Can scale down even if the cluster is not idle by looking at shuffle file state.
  • Scales down based on a percentage of current nodes.
  • On job clusters, scales down if the cluster is underutilized over the last 40 seconds.
  • On interactive clusters, scales down if the cluster is underutilized over the last 150 seconds.

Standard

  • Starts with adding 4 nodes. Thereafter, scales up exponentially, but can take many steps to reach the max.
  • Scales down only when the cluster is completely idle and it has been underutilized for the last 10 minutes.
  • Scales down exponentially, starting with 1 node.

Enable and configure autoscaling

To allow Databricks to resize your cluster automatically, you enable autoscaling for the cluster and provide the min and max range of workers.

  1. Enable autoscaling.

    • Interactive cluster - On the Create Cluster page, select the Enable autoscaling checkbox in the Autopilot Options box:

      enable_autoscaling
    • Job cluster - On the Configure Cluster page, select the Enable autoscaling checkbox in the Autopilot Options box:

    enable_autoscaling
  2. Configure the min and max workers.

    enable_autoscaling

Autoscaling example

If you reconfigure a static cluster to be an autoscaling cluster, Databricks immediately resizes the cluster within the minimum and maximum bounds and then starts autoscaling. As an example, the table below demonstrates what happens to clusters with a certain initial size if you reconfigure a cluster to autoscale between 5 and 10 nodes.

Initial size Size after reconfiguration
6 6
12 10
3 5