Distributed Data Parallel (DDP) training
Beta
This feature is in Beta.
This page has notebook examples for using Distributed Data Parallel (DDP) training on Serverless GPU compute. These examples demonstrate how to scale training across multiple GPUs and nodes for improved performance.
Training a simple multilayer perceptron (MLP) neural network on synthetic dataset using DDP
The following notebook demonstrates distributed training of a simple multilayer perceptron (MLP) neural network using PyTorch's DDP module on Databricks with serverless GPU resources.
Notebook
Training OpenAI GPT OSS 20B model on 8 H100 using DDP and Hugging Face
This notebook demonstrates how to use the Serverless GPU Python API to run supervised fine-tuning (SFT) on the GPT OSS 20B model from Hugging Face using the Transformer Reinforcement Learning (TRL) library.