Large language models (LLMs)
This feature is in Beta.
This page provides notebook examples for fine-tuning large language models (LLMs) using Serverless GPU compute. These examples demonstrate various approaches to fine-tuning including parameter-efficient methods like Low-Rank Adaptation (LoRA) and full supervised fine-tuning.
Before running these notebooks, see the Best practices checklist.
Fine-tune Qwen2-0.5B model
The following notebook provides an example of how to efficiently fine-tune the Qwen2-0.5B model using:
- Transformer reinforcement learning (TRL) for supervised fine-tuning
- Liger Kernels for memory-efficient training with optimized Triton kernels.
- LoRA for parameter-efficient fine-tuning.
Notebook
Fine-tune Llama-3.2-3B with Unsloth
This notebook demonstrates how to fine-tune Llama-3.2-3B using the Unsloth library.
Notebook
Fine-tune a GPT OSS 20B model
This notebook demonstrates how to fine-tune OpenAI's gpt-oss-20b
model on a H100 GPU using LoRA for parameter-efficient fine-tuning.
Notebook
Supervised fine-tuning using DeepSpeed and TRL
This notebook demonstrates how to use the Serverless GPU Python API to run supervised fine-tuning (SFT) using the Transformer Reinforcement Learning (TRL) library with DeepSpeed ZeRO Stage 3 optimization.