Skip to main content

Distributed Data Parallel (DDP) training

Beta

This feature is in Beta.

This page has notebook examples for using Distributed Data Parallel (DDP) training on Serverless GPU compute. These examples demonstrate how to scale training across multiple GPUs and nodes for improved performance.

Training a simple multilayer perceptron (MLP) neural network on synthetic dataset using DDP

The following notebook demonstrates distributed training of a simple multilayer perceptron (MLP) neural network using PyTorch's DDP module on Databricks with serverless GPU resources.

Notebook

Open notebook in new tab

Training OpenAI GPT OSS 20B model on 8 H100 using DDP and Hugging Face

This notebook demonstrates how to use the Serverless GPU Python API to run supervised fine-tuning (SFT) on the GPT OSS 20B model from Hugging Face using the Transformer Reinforcement Learning (TRL) library.

Notebook

Open notebook in new tab