Skip to main content

Train AI and ML models

Databricks offers flexible compute solutions tailored to different machine learning needs, ranging from managed cluster runtimes to fully serverless GPU environments.

Serverless GPU compute (Beta)

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

Serverless GPU compute is a specialized offering within the Databricks serverless ecosystem. It is optimized for custom single-node and multi-node deep learning workloads, such as fine-tuning LLMs or training computer vision models.

Key features include:

  • Instant availability: Removes the need to manage underlying cluster infrastructure, allowing you to connect a notebook directly to serverless GPU resources.
  • High-performance hardware: Provides access to A10 GPUs for cost-effective tasks and H100 GPUs for large-scale AI workloads.
  • Managed environments: Offers a default base environment for full customization or an AI environment pre-loaded with common ML packages like Transformers and Ray.
  • Flexible scaling: Supports distributed training across multiple GPUs and nodes.

Databricks Runtime for Machine Learning

Databricks Runtime for Machine Learning is a specialized runtime that automates the creation of compute resources with pre-built infrastructure. It is designed for users who want a comprehensive, ready-to-use environment for both classic machine learning and deep learning.

Key features include:

  • Pre-installed libraries: Includes popular libraries like PyTorch, TensorFlow, and XGBoost, which receive frequent updates and optimized support.
  • Compute versatility: Supports both CPU and GPU-based instance types, including AWS Graviton for improved price-to-performance.
  • Optimization: Offers integration with Photon to accelerate Spark SQL, DataFrames, and feature engineering tasks.
  • Access control: Requires dedicated access mode for secure data access through Unity Catalog.