Skip to main content

Large language models (LLMs)

Public Preview

AI Runtime for single-node tasks is in Public Preview. The distributed training API for multi-GPU workloads remain in Beta.

This page provides notebook examples for fine-tuning large language models (LLMs) using AI Runtime. These examples demonstrate various approaches to fine-tuning including parameter-efficient methods like Low-Rank Adaptation (LoRA) and full supervised fine-tuning.

Tutorial

Description

Fine-tune Qwen2-0.5B model

Efficiently fine-tune the Qwen2-0.5B model using Transformer Reinforcement Learning (TRL), Liger Kernels for memory-efficient training, and LoRA for parameter-efficient fine-tuning.

Fine-tune Llama-3.2-3B with Unsloth

Fine-tune Llama-3.2-3B using the Unsloth library.

Fine-tune a GPT OSS 20B model

Fine-tune OpenAI's gpt-oss-20b model on a H100 GPU using LoRA for parameter-efficient fine-tuning.

Supervised fine-tuning using DeepSpeed and TRL

Use the Serverless GPU Python API to run supervised fine-tuning (SFT) using the TRL library with DeepSpeed ZeRO Stage 3 optimization.

LoRA fine-tuning using Axolotl

Use the Serverless GPU Python API to LoRA fine-tune an Olmo3 7B model using the Axolotl library.

Video demo

This video walks through the Fine-tune Llama-3.2-3B with Unsloth example notebook in detail (12 minutes).