Task Preemption for High Concurrency

New in version 2.2.0-db1.

Starting with Spark 2.2, the Spark scheduler in Databricks will automatically preempt tasks to enforce fair sharing between users. This guarantees interactive response times to users on clusters with many concurrently running jobs.

Tip

When tasks are preempted by the scheduler, their kill reason will be set to preempted by scheduler. This reason is visible in the Spark UI and can be used to debug preemption behavior.

Preemption options

By default, preemption is conservative: jobs can be starved of resources for up to 30 seconds before the scheduler intervenes. Preemption can be tuned by setting the following Spark configurations at cluster launch time:

Whether preemption should be enabled. This can only be set in Spark 2.2 and above.

spark.databricks.preemption.enabled true

The fair share fraction to guarantee per user. Setting this to 1.0 means the scheduler will aggressively attempt to guarantee perfect fair sharing. Setting this to 0.0 effectively disables preemption. The default setting is 0.5, which means at worst a user will get half of their fair share.

spark.databricks.preemption.threshold 0.5

How long a user must remain starved before preemption kicks in. Setting this to lower values will provide more interactive response times, at the cost of cluster efficiency. Recommended values are from 1-100 seconds.

spark.databricks.preemption.timeout 30s

How often the scheduler will check for task preemption. This should be set to less than the preemption timeout.

spark.databricks.preemption.interval 5s