AI Runtime CLI examples

Preview

The following examples are complete, end-to-end workloads you submit from the air CLI with air run -f train.yaml. Each shows a real multi-GPU pattern on H100 GPUs, including the workload YAML, bootstrap commands, and code. Start with the quickstart if you haven't submitted a run before.

- Multi-node LLM fine-tuning with FSDP
- Supervised fine-tuning of Llama-3.1-8B across 16 H100 GPUs (2 nodes) using torchrun and PyTorch Fully Sharded Data Parallel (FSDP). Logs to MLflow and checkpoints to a Unity Catalog volume.
- Distributed training with Ray Train
- Distributed data-parallel fine-tuning with Ray Train's TorchTrainer across 8 H100 GPUs on a single node, with one worker per GPU.
- Batch inference with Ray Data and vLLM
- Offline LLM batch inference with Ray Data and vLLM across 8 H100 GPUs on a single node, running one vLLM replica per GPU and writing results to a Unity Catalog volume as Parquet.