Skip to main content

Large language models (LLMs)

Public Preview

AI Runtime for single-node tasks is in Public Preview. The distributed training API for multi-GPU workloads remain in Beta.

This page provides notebook examples for fine-tuning large language models (LLMs) using AI Runtime. These examples demonstrate various approaches to fine-tuning including parameter-efficient methods like Low-Rank Adaptation (LoRA) and full supervised fine-tuning.

Tutorial

Description

Fine-tune Qwen2-0.5B model

Efficiently fine-tune the Qwen2-0.5B model using Transformer Reinforcement Learning (TRL), Liger Kernels for memory-efficient training, and LoRA for parameter-efficient fine-tuning.

Fine-tune Llama-3.2-3B with Unsloth

Fine-tune Llama-3.2-3B using the Unsloth library.

Fine-tune a GPT OSS 20B model

Fine-tune OpenAI's gpt-oss-20b model on a H100 GPU using LoRA for parameter-efficient fine-tuning.

Supervised fine-tuning using DeepSpeed and TRL

Use the Serverless GPU Python API to run supervised fine-tuning (SFT) using the TRL library with DeepSpeed ZeRO Stage 3 optimization.

LoRA fine-tuning using Axolotl

Use the Serverless GPU Python API to LoRA fine-tune an Olmo3 7B model using the Axolotl library.

Distributed fine-tune Qwen2-0.5B

Fine-tune the Qwen2-0.5B model using LoRA and Liger Kernels for memory-efficient distributed training with parameter reduction.

Distributed fine-tune Llama-3.2-3B with Unsloth

Fine-tune Llama-3.2-3B using distributed training across multiple GPUs with the Unsloth library for optimized parameter-efficient training.

Fine-tune Llama 3.1 8B with LLM Foundry

Fine-tune the Llama 3.1 8B model using Mosaic LLM Foundry with distributed training strategies and model evaluation.

Fine-tune GPT-OSS 120B with DDP and FSDP

Fine-tune OpenAI's GPT-OSS 120B model using supervised fine-tuning on H100 GPUs with DDP and FSDP distributed training strategies.

Distributed training with PyTorch FSDP

Train Transformer models using PyTorch Fully Sharded Data Parallel (FSDP) to shard model parameters across multiple GPUs.

Video demo

This video walks through the Fine-tune Llama-3.2-3B with Unsloth example notebook in detail (12 minutes).