Databricks workload types: feature comparison

Databricks offers three “compute” types, each designed for a different type of workload:

  • Jobs Light compute: Run Databricks jobs on Jobs Light clusters with the open source Spark runtime on the Databricks platform.
  • Jobs compute: Run Databricks jobs on Jobs clusters with Databricks’ optimized runtime for massive performance and scalability improvement.
  • All-purpose compute: Run any workloads on All-purpose clusters, including interactive data science and analysis, BI workloads via JDBC/ODBC, MLflow experiments, Databricks jobs, and so on.

This table shows the features available with each compute type.

Feature Jobs Light compute Jobs compute All-purpose compute
Managed Apache Spark X X X
Apache Spark clusters for running production jobs on the Databricks platform, with alerting and retries.      
Job scheduling with libraries X X X
Easy to run production jobs including streaming with monitoring and a scheduler for running libraries.      
Job scheduling with notebooks   X X
Ability to schedule jobs using Scala, Python, R, and SQL notebooks and notebook workflows.      
Autopilot clusters   X X
Easy to manage and cost-effective clusters, with autoscaling of compute and instance storage, automatic start and termination of clusters.      
Databricks Runtime for ML   X X
Out-of-the-box ML frameworks, including Spark/Horovod integration; XGBoost, TensorFlow, PyTorch, and Keras support.      
Managed MLflow   X X
Run MLflow on the Databricks platform to simplify the end-to-end ML lifecycle, with MLflow remote execution and a managed tracking server. You can even run MLflow from outside of Databricks (usage may be subject to a limit).      
Delta Lake with Delta Engine   X X
Robust pipelines serving clean, quality data supporting high performance batch and streaming analytics at scale. Delta Lake on Databricks provides ACID transactions, schema management, batch/stream read/write support, and data versioning, along with Delta Engine’s performance optimizations.      
Interactive clusters     X
High-concurrency mode for multiple users and persistent clusters for analytics.      
Notebooks and collaboration     X
Enable highly collaborative and productive work among analysts and with other colleagues using Scala, Python, SQL, and R notebooks that provide one-click visualization, interactive dashboards, revision history, and version control integration (Github, Bitbucket).      
Ecosystem integrations     X
RStudio® integration and a range of third party BI tools through JDBC/ODBC.      

For information on pricing by compute type, see AWS Pricing.