Task Preemption for High Concurrency¶
This functionality is supported on Databricks Runtime 2.2.0-db1 and above.
For Spark 2.2 and above, the Spark scheduler in Databricks automatically preempts tasks to enforce fair sharing between users. This guarantees interactive response times to users on clusters with many concurrently running jobs.
When tasks are preempted by the scheduler, their kill reason will be set to
preempted by scheduler. This reason is visible in the Spark UI and can be used to debug preemption behavior.
By default, preemption is conservative: jobs can be starved of resources for up to 30 seconds before the scheduler intervenes. Preemption can be tuned by setting the following Spark configurations at cluster launch time:
Whether preemption should be enabled. This can only be set in Spark 2.2 and above.
The fair share fraction to guarantee per user. Setting this to 1.0 means the scheduler will aggressively attempt to guarantee perfect fair sharing. Setting this to 0.0 effectively disables preemption. The default setting is 0.5, which means at worst a user will get half of their fair share.
How long a user must remain starved before preemption kicks in. Setting this to lower values will provide more interactive response times, at the cost of cluster efficiency. Recommended values are from 1-100 seconds.
How often the scheduler will check for task preemption. This should be set to less than the preemption timeout.