A Databricks cluster is a set of computation resources and configurations on which you run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.
You run these workloads as a set of commands in a notebook or as an automated job. Databricks makes a distinction between interactive clusters and automated clusters. You use interactive clusters to analyze data collaboratively using interactive notebooks. You use automated clusters to run fast and robust automated jobs.
- You can create an interactive cluster using the UI, CLI, or REST API. You can manually terminate and restart an interactive cluster. Multiple users can share such clusters to do collaborative interactive analysis.
- The Databricks job scheduler creates an automated cluster when you run a job on a new automated cluster and terminates the cluster when the job is complete. You cannot restart a job cluster.
This section also focuses more on interactive than automated clusters, although many of the configurations and management tools described apply equally to both cluster types. To learn more about creating automated clusters, see Jobs.
Databricks retains cluster configuration information for up to 70 interactive clusters terminated in the last 30 days and up to 30 automated clusters recently terminated by the job scheduler. To keep an interactive cluster configuration even after it has been terminated for more than 30 days, an administrator can pin a cluster to the cluster list.
In this section:
- Create a Cluster
- Manage Clusters
- Cluster Configurations
- Customize Containers with Databricks Container Services
- Cluster Node Initialization Scripts
- GPU-enabled Clusters