Skip to main content

Fully Sharded Data Parallel (FSDP) training

Beta

This feature is in Beta.

This page has notebook examples for using Fully Sharded Data Parallel (FSDP) training on Serverless GPU compute. These examples demonstrate how to scale training across multiple GPUs and nodes for improved performance.

Training a Transformer model with 10-million parameters using FSDP2

The following notebook demonstrates distributed training of a 10-million parameter Transformer model using FSDP2 library.

Notebook

Open notebook in new tab

Trainining OpenAI GPT OSS 120B model using TRL and FSDP

This notebook demonstrates how to run supervised fine-tuning (SFT) on an GPT OSS 120B model using FSDP2 and distributed Severless GPU library.

Notebook

Open notebook in new tab