PyTorch

PyTorch project is a Python package that provides GPU accelerated tensor computation and high level functionalities for building deep learning networks. For licensing details, see the PyTorch license doc on GitHub.

In the sections below, we provide guidance on installing PyTorch on Databricks and give an example of running PyTorch programs.

Note

This is not a comprehensive guide to PyTorch. Refer to the PyTorch website.

Install PyTorch

Databricks Runtime for ML

PyTorch is included in Databricks Runtime 5.1 ML Beta and above. You can create a cluster using Databricks Runtime ML and start using PyTorch. See Databricks Runtime for Machine Learning.

Databricks Runtime

We recommend using the PyTorch included on Databricks Runtime for ML. However, if you must use Databricks Runtime, PyTorch can be installed as a Databricks PyPI library. We show how to install PyTorch 1.1.0 below:

  • On GPU clusters, install pytorch and torchvision by specifying the following:
    • torch==1.1.0
    • torchvision==0.3.0
  • On CPU clusters, install pytorch and torchvision by using the wheel files below:
    • Python 3:
      • https://download.pytorch.org/whl/cpu/torch-1.1.0-cp35-cp35m-linux_x86_64.whl
      • https://download.pytorch.org/whl/cpu/torchvision-0.3.0-cp35-cp35m-linux_x86_64.whl
    • Python 2:
      • https://download.pytorch.org/whl/cpu/torch-1.1.0-cp27-cp27mu-linux_x86_64.whl
      • https://download.pytorch.org/whl/cpu/torchvision-0.3.0-cp27-cp27mu-linux_x86_64.whl

Use PyTorch on a single node

To test and migrate single-machine PyTorch workflows, you can start with a driver-only cluster on Databricks by setting the number of workers to zero. Though Apache Spark is not functional under this setting, it is a cost-effective way to run single-machine PyTorch workflows.