Databricks workload types: feature comparison

Databricks offers three “compute” types, each designed for a different type of workload:

  • Jobs Light compute: Run Databricks jobs on Jobs Light clusters with the open source Spark runtime on the Databricks platform.

  • Jobs compute: Run Databricks jobs on Jobs clusters with Databricks’ optimized runtime for massive performance and scalability improvement.

  • All-purpose compute: Run any workloads on All-purpose clusters, including interactive data science and analysis, BI workloads via JDBC/ODBC, MLflow experiments, Databricks jobs, and so on.

This table shows the features available with each compute type.

Feature

Jobs Light compute

Jobs compute

All-purpose compute

Managed Apache Spark

X

X

X

Apache Spark clusters for running production jobs on the Databricks platform, with alerting and retries.

Job scheduling with libraries

X

X

X

Easy to run production jobs including streaming with monitoring and a scheduler for running libraries.

Job scheduling with notebooks

X

X

Ability to schedule jobs using Scala, Python, R, and SQL notebooks and notebook workflows.

Autopilot clusters

X

X

Easy to manage and cost-effective clusters, with autoscaling of compute and instance storage, automatic start and termination of clusters.

Databricks Runtime for ML

X

X

Out-of-the-box ML frameworks, including Spark/Horovod integration; XGBoost, TensorFlow, PyTorch, and Keras support.

Managed MLflow

X

X

Run MLflow on the Databricks platform to simplify the end-to-end ML lifecycle, with MLflow remote execution and a managed tracking server. You can even run MLflow from outside of Databricks (usage may be subject to a limit).

Delta Lake with Delta Engine

X

X

Robust pipelines serving clean, quality data supporting high performance batch and streaming analytics at scale. Delta Lake on Databricks provides ACID transactions, schema management, batch/stream read/write support, and data versioning, along with Delta Engine’s performance optimizations.

Interactive clusters

X

High-concurrency mode for multiple users and persistent clusters for analytics.

Notebooks and collaboration

X

Enable highly collaborative and productive work among analysts and with other colleagues using Scala, Python, SQL, and R notebooks that provide one-click visualization, interactive dashboards, revision history, and version control integration (Github, Bitbucket).

Ecosystem integrations

X

RStudio® integration and a range of third party BI tools through JDBC/ODBC.

For information on pricing by compute type, see AWS Pricing.