Large language models (LLMs)
AI Runtime for single-node tasks is in Public Preview. The distributed training API for multi-GPU workloads remain in Beta.
This page provides notebook examples for fine-tuning large language models (LLMs) using AI Runtime. These examples demonstrate various approaches to fine-tuning including parameter-efficient methods like Low-Rank Adaptation (LoRA) and full supervised fine-tuning.
Tutorial | Description |
|---|---|
Efficiently fine-tune the Qwen2-0.5B model using Transformer Reinforcement Learning (TRL), Liger Kernels for memory-efficient training, and LoRA for parameter-efficient fine-tuning. | |
Fine-tune Llama-3.2-3B using the Unsloth library. | |
Fine-tune OpenAI's | |
Use the Serverless GPU Python API to run supervised fine-tuning (SFT) using the TRL library with DeepSpeed ZeRO Stage 3 optimization. | |
Use the Serverless GPU Python API to LoRA fine-tune an Olmo3 7B model using the Axolotl library. |
Video demo
This video walks through the Fine-tune Llama-3.2-3B with Unsloth example notebook in detail (12 minutes).