Large language models (LLMs)
AI Runtime for single-node tasks is in Public Preview. The distributed training API for multi-GPU workloads remain in Beta.
This page provides notebook examples for fine-tuning large language models (LLMs) using AI Runtime. These examples demonstrate various approaches to fine-tuning including parameter-efficient methods like Low-Rank Adaptation (LoRA) and full supervised fine-tuning.
Tutorial | Description |
|---|---|
Efficiently fine-tune the Qwen2-0.5B model using Transformer Reinforcement Learning (TRL), Liger Kernels for memory-efficient training, and LoRA for parameter-efficient fine-tuning. | |
Fine-tune Llama-3.2-3B using the Unsloth library. | |
Fine-tune OpenAI's | |
Use the Serverless GPU Python API to run supervised fine-tuning (SFT) using the TRL library with DeepSpeed ZeRO Stage 3 optimization. | |
Use the Serverless GPU Python API to LoRA fine-tune an Olmo3 7B model using the Axolotl library. | |
Fine-tune the Qwen2-0.5B model using LoRA and Liger Kernels for memory-efficient distributed training with parameter reduction. | |
Fine-tune Llama-3.2-3B using distributed training across multiple GPUs with the Unsloth library for optimized parameter-efficient training. | |
Fine-tune the Llama 3.1 8B model using Mosaic LLM Foundry with distributed training strategies and model evaluation. | |
Fine-tune OpenAI's GPT-OSS 120B model using supervised fine-tuning on H100 GPUs with DDP and FSDP distributed training strategies. | |
Train Transformer models using PyTorch Fully Sharded Data Parallel (FSDP) to shard model parameters across multiple GPUs. |
Video demo
This video walks through the Fine-tune Llama-3.2-3B with Unsloth example notebook in detail (12 minutes).