Large language models (LLMs)

Beta

This feature is in Beta.

This page provides notebook examples for fine-tuning large language models (LLMs) using Serverless GPU compute. These examples demonstrate various approaches to fine-tuning including parameter-efficient methods like Low-Rank Adaptation (LoRA) and full supervised fine-tuning.

Fine-tune Qwen2-0.5B model

The following notebook provides an example of how to efficiently fine-tune the Qwen2-0.5B model using:

Transformer reinforcement learning (TRL) for supervised fine-tuning
Liger Kernels for memory-efficient training with optimized Triton kernels.
LoRA for parameter-efficient fine-tuning.

Notebook

Open notebook in new tab

Fine-tune Llama-3.2-3B with Unsloth

This notebook demonstrates how to fine-tune Llama-3.2-3B using the Unsloth library.

Notebook

Open notebook in new tab

Fine-tune an embedding model

This notebook demonstrates how to fine-tune the gte-large-en-v1.5 embedding model on a single A10G using Mosaic LLM Foundry.

Notebook

Open notebook in new tab

Fine-tune a GPT OSS 20B model

This notebook demonstrates how to fine-tune OpenAI's gpt-oss-20b model on a H100 GPU using LoRA for parameter-efficient fine-tuning.

Notebook

Open notebook in new tab

Fine-tune Qwen2-0.5B model​

Notebook

Fine-tune Llama-3.2-3B with Unsloth​

Notebook

Fine-tune an embedding model​

Notebook

Fine-tune a GPT OSS 20B model​

Notebook

Fine-tune Qwen2-0.5B model

Fine-tune Llama-3.2-3B with Unsloth

Fine-tune an embedding model

Fine-tune a GPT OSS 20B model