PyTorch project is a python package that provides GPU accelerated tensor computation and high level functionalities for building deep learning networks. For licensing details, see the PyTorch license doc on GitHub.
In the sections below, we provide guidance on installing PyTorch on Databricks and give an example of running PyTorch programs. See Integrating Deep Learning Libraries with Apache Spark for an example of integrating a deep learning library with Spark.
This guide is not a comprehensive guide on PyTorch. Please also refer to the PyTorch website.
Install PyTorch using an init script¶
Databricks recommends using Init Scripts to install PyTorch to make it available on all cluster nodes. The example notebook below installs an init script that installs PyTorch according to official website.
Use PyTorch on a single node¶
To test and migrate single-machine PyTorch workflows, you can start with a driver-only cluster on Databricks by setting the number of workers to zero. Though Apache Spark is not functional under this setting, it is a cost-effective way to run single-machine PyTorch workflows.