A cluster consists of one driver node and worker nodes. You can pick separate cloud provider instance types for the driver and worker nodes, although by default the driver node uses the same instance type as the worker node. Different families of instance types fit different use cases, such as memory-intensive or compute-intensive workloads.
The driver maintains state information of all notebooks attached to the cluster. The driver node is also responsible for maintaining the SparkContext and interpreting all the commands you run from a notebook or a library on the cluster. The driver node also runs the Apache Spark master that coordinates with the Spark executors.
The default value of the driver node type is the same as the worker node type. You can choose a larger driver node type with more memory if you are planning to
collect() a lot of data from Spark workers and analyze them in the notebook.
Since the driver node maintains all of the state information of the notebooks attached, make sure to detach unused notebooks from the driver.
Databricks workers run the Spark executors and other services required for the proper functioning of the clusters. When you distribute your workload with Spark, all of the distributed processing happens on workers.
To run a Spark job, you need at least one worker. If a cluster has zero workers, you can run non-Spark commands on the driver, but Spark commands will fail.
Databricks launches worker nodes with two private IP addresses each. The node’s primary private IP address is used to host Databricks internal traffic. The secondary private IP address is used by the Spark container for intra-cluster communication. This model allows Databricks to provide isolation between multiple clusters in the same workspace.
Databricks maps cluster node instance types to compute units known as DBUs. See the Databricks instance type pricing page for a list of the supported instance types and their corresponding DBUs. For cloud provider information, see AWS instance type specifications and pricing.
Databricks will always provide one year’s deprecation notice before ceasing support for an AWS instance type.