Skip to main content

AI Runtime CLI examples

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

The following examples are complete, end-to-end workloads you submit from the air CLI with air run -f train.yaml. Each shows a real distributed-training pattern on H100 GPUs, including the workload YAML, launcher script, and training code. Start with the quickstart if you haven't submitted a run before.

    • Multi-node LLM fine-tuning with FSDP
    • Supervised fine-tuning of Llama-3.1-8B across 16 H100 GPUs (2 nodes) using torchrun and PyTorch Fully Sharded Data Parallel (FSDP). Logs to MLflow and checkpoints to a Unity Catalog volume.