Skip to main content

Train AI and ML models

Databricks offers flexible compute solutions tailored to different machine learning needs, ranging from managed cluster runtimes to fully serverless GPU environments.

    • AI Runtime
    • Serverless GPU compute environment optimized for custom single-node and multi-node deep learning workloads.

AI Runtime (Preview)

Preview

This feature is in Public Preview.

AI Runtime is a specialized offering within the Databricks serverless ecosystem. It is optimized for custom single-node and multi-node deep learning workloads, such as fine-tuning LLMs or training computer vision models.

Key features include:

  • Instant availability: Removes the need to manage underlying cluster infrastructure, allowing you to connect a notebook directly to serverless GPU resources.
  • High-performance hardware: Provides access to A10 GPUs for cost-effective tasks and H100 GPUs for large-scale AI workloads.
  • Managed environments: Offers a default base environment for full customization or an AI environment pre-loaded with common ML packages like Transformers and Ray.
  • Flexible scaling: Supports distributed training across multiple GPUs and nodes.

Databricks Runtime for Machine Learning

Databricks Runtime for Machine Learning is a specialized runtime that automates the creation of compute resources with pre-built infrastructure. It is designed for users who want a comprehensive, ready-to-use environment for both classic machine learning and deep learning.

Key features include:

  • Pre-installed libraries: Includes popular libraries like PyTorch, TensorFlow, and XGBoost, which receive frequent updates and optimized support.
  • Compute versatility: Supports both CPU and GPU-based instance types, including AWS Graviton for improved price-to-performance.
  • Optimization: Offers integration with Photon to accelerate Spark SQL, DataFrames, and feature engineering tasks.
  • Access control: Requires dedicated access mode for secure data access through Unity Catalog.