Multi-node distributed training
This feature is in Beta.
This page provides notebook examples for multi-node distributed training using Serverless GPU compute. These examples demonstrate how to scale training across multiple GPUs and nodes for improved performance.
Serverless GPU API: A10 starter
The following notebook provides a basic example of how to use the Serverless GPU Python API to launch multiple A10 GPUs for distributed training.
Notebook
Distributed training and hyperparameter sweeps
The following notebook provides an example of distributed training and hyperparameter sweeps fine-tuning using the Serverless GPU Python API.
Notebook
Distributed supervised fine-tuning using TRL
This notebook demonstrates how to use Databricks Serverless GPU to run supervised fine-tuning (SFT) using the TRL library with DeepSpeed ZeRO Stage 3 optimization on a single node A10 GPU. This approach can be extended to multi-node setups.