Databricks supports two types of clusters: standard and high concurrency. The default cluster mode is standard.
Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies. This sharing is accomplished via:
- Preemption: Proactively preempts Spark tasks from over-committed users to ensure all users get their fair share of cluster time and their jobs complete in a timely manner even when contending with dozens of other users. This uses Spark Task Preemption for High Concurrency.
- Fault isolation: Creates an environment for each notebook, effectively isolating them from one another.
- High concurrency clusters work only for SQL, Python, and R. The performance, security, and fault isolation of high concurrency clusters is provided by running user code in separate processes, which is not possible in Scala.
- The Table Access Control checkbox is available only for high concurrency clusters.
To create a high concurrency cluster, in the Cluster Mode drop-down select High Concurrency.
For an example of how to create a high concurrency cluster using the Clusters API, see High concurrency cluster example.