Databricks workload types: feature comparison
Databricks offers three “compute” types, each designed for a different type of workload:
Jobs Light compute: Run Databricks jobs on Jobs Light clusters with the open source Spark runtime on the Databricks platform.
Jobs compute: Run Databricks jobs on Jobs clusters with Databricks’ optimized runtime for massive performance and scalability improvement.
All-purpose compute: Run any workloads on All-purpose clusters, including interactive data science and analysis, BI workloads via JDBC/ODBC, MLflow experiments, Databricks jobs, and so on.
This table shows the features available with each compute type.
Feature |
Jobs Light compute |
Jobs compute |
All-purpose compute |
---|---|---|---|
Managed Apache Spark |
X |
X |
X |
Apache Spark clusters for running production jobs on the Databricks platform, with alerting and retries. |
|||
Job scheduling with libraries |
X |
X |
X |
Easy to run production jobs including streaming with monitoring and a scheduler for running libraries. |
|||
Job scheduling with notebooks |
X |
X |
|
Ability to schedule jobs using Scala, Python, R, and SQL notebooks and notebook workflows. |
|||
Autopilot clusters |
X |
X |
|
Easy to manage and cost-effective clusters, with autoscaling of compute and instance storage, automatic start and termination of clusters. |
|||
Databricks Runtime for ML |
X |
X |
|
Out-of-the-box ML frameworks, including Spark/Horovod integration; XGBoost, TensorFlow, PyTorch, and Keras support. |
|||
Managed MLflow |
X |
X |
|
Run MLflow on the Databricks platform to simplify the end-to-end ML lifecycle, with MLflow remote execution and a managed tracking server. You can even run MLflow from outside of Databricks (usage may be subject to a limit). |
|||
Delta Lake with Delta Engine |
X |
X |
|
Robust pipelines serving clean, quality data supporting high performance batch and streaming analytics at scale. Delta Lake on Databricks provides ACID transactions, schema management, batch/stream read/write support, and data versioning, along with Delta Engine’s performance optimizations. |
|||
Interactive clusters |
X |
||
High-concurrency mode for multiple users and persistent clusters for analytics. |
|||
Notebooks and collaboration |
X |
||
Enable highly collaborative and productive work among analysts and with other colleagues using Scala, Python, SQL, and R notebooks that provide one-click visualization, interactive dashboards, revision history, and version control integration (Github, Bitbucket). |
|||
Ecosystem integrations |
X |
||
RStudio® integration and a range of third party BI tools through JDBC/ODBC. |
For information on pricing by compute type, see AWS Pricing.