Fully Sharded Data Parallel (FSDP) training
Beta
This feature is in Beta.
This page has notebook examples for using Fully Sharded Data Parallel (FSDP) training on Serverless GPU compute. These examples demonstrate how to scale training across multiple GPUs and nodes for improved performance.
Training a Transformer model with 10-million parameters using FSDP2
The following notebook demonstrates distributed training of a 10-million parameter Transformer model using FSDP2 library.
Notebook
Trainining OpenAI GPT OSS 120B model using TRL and FSDP
This notebook demonstrates how to run supervised fine-tuning (SFT) on an GPT OSS 120B model using FSDP2 and distributed Severless GPU library.