Distributed training with DeepSpeed distributor

This article describes how to perform distributed training on PyTorch ML models using the DeepSpeed distributor.

The DeepSpeed distributor is built on top of TorchDistributor and is a recommended solution for customers with models that require higher compute power, but are limited by memory constraints.

The DeepSpeed library is an open-source library developed by Microsoft and is available in Databricks Runtime 14.0 ML or above. It offers optimized memory usage, reduced communication overhead, and advanced pipeline parallelism that allow for scaling of models and training procedures that would otherwise be unattainable on standard hardware.

The following are example scenarios where the DeepSpeed distributor is beneficial:

Low GPU memory.
Large model training.
Large input data, like during batch inference.

Example notebook for distributed training with DeepSpeed

The following notebook example demonstrates how to perform distributed training with DeepSpeed distributor.

Fine-tune Llama 2 7B Chat with DeepspeedTorchDistributor notebook

Open notebook in new tab Open in Databricks

Example notebook for distributed training with DeepSpeed​

Fine-tune Llama 2 7B Chat with DeepspeedTorchDistributor notebook

Example notebook for distributed training with DeepSpeed