Clusters

A Databricks cluster is a set of computation resources and configurations on which you run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.

You run these workloads as a set of commands in a notebook or as an automated job. Databricks makes a distinction between interactive clusters and automated clusters. You use interactive clusters to analyze data collaboratively using interactive notebooks. You use automated clusters to run fast and robust automated jobs.

  • You can create an interactive cluster using the UI, CLI, or REST API. You can manually terminate and restart an interactive cluster. Multiple users can share such clusters to do collaborative interactive analysis.
  • The Databricks job scheduler creates an automated cluster when you run a job on a new automated cluster and terminates the cluster when the job is complete. You cannot restart a job cluster.

This section focuses on creating and managing clusters using the UI. To learn about managing clusters using the CLI and the REST API, see Databricks CLI and Clusters API.

This section also focuses more on interactive than automated clusters, although many of the configurations and management tools described apply equally to both cluster types. To learn more about creating automated clusters, see Jobs.

In this section: