Train AI and ML models

Databricks offers flexible compute solutions tailored to different machine learning needs, ranging from managed cluster runtimes to fully serverless GPU environments.

- AI Runtime
- Serverless GPU compute environment optimized for custom single-node and multi-node deep learning workloads.
- Databricks Runtime for Machine Learning
- Classic compute environment with pre-built libraries for classic machine learning and deep learning workloads.

AI Runtime (Preview)

Preview

This feature is in Public Preview.

AI Runtime is a specialized offering within the Databricks serverless ecosystem. It is optimized for custom single-node and multi-node deep learning workloads, such as fine-tuning LLMs or training computer vision models. For an overview of how serverless compute fits into the Databricks architecture, see Serverless workspace architecture.

Key features include:

Instant availability: Removes the need to manage underlying cluster infrastructure, allowing you to connect a notebook directly to serverless GPU resources.
High-performance hardware: Provides access to A10 GPUs for cost-effective tasks and H100 GPUs for large-scale AI workloads.
Managed environments: Offers a default base environment for full customization or an AI environment pre-loaded with common ML packages like Transformers and Ray.
Flexible scaling: Supports distributed training across multiple GPUs and nodes.

Databricks Runtime for Machine Learning

Databricks Runtime for Machine Learning is a specialized runtime that automates the creation of compute resources with pre-built infrastructure. It is designed for users who want a comprehensive, ready-to-use environment for both classic machine learning and deep learning.

Key features include:

Pre-installed libraries: Includes popular libraries like PyTorch, TensorFlow, and XGBoost, which receive frequent updates and optimized support.
Compute versatility: Supports both CPU and GPU-based instance types, including AWS Graviton for improved price-to-performance.
Optimization: Offers integration with Photon to accelerate Spark SQL, DataFrames, and feature engineering tasks.
Access control: Requires dedicated access mode for secure data access through Unity Catalog.

AI Runtime (Preview)​

Databricks Runtime for Machine Learning​

AI Runtime (Preview)

Databricks Runtime for Machine Learning