GPU-enabled Clusters

Note

Some GPU-enabled instance types are in Beta and are marked as such in the drop-down list when you select the driver and worker types during cluster creation.

Overview

Databricks supports clusters accelerated with graphics processing units (GPUs). This topic describes how to create clusters with GPU-enabled instances and describes the GPU drivers and libraries installed on those instances.

To learn more about deep learning on GPU-enabled clusters, see Deep Learning.

Create a GPU cluster

Creating a GPU cluster is similar to creating any Spark cluster (See Clusters). You should keep in mind the following:

  • The Databricks Runtime Version must be a GPU-enabled version, such as 4.1 (includes Apache Spark 2.3.0, GPU, Scala 2.11).
  • The Worker Type and Driver Type must be GPU instance types.
  • For single-machine workflows without Spark, you can set the number of workers to zero.
  • In order to avoid conflicts among multiple Spark tasks trying to use the same GPU, Databricks automatically configures Spark to use one executor thread per worker machine. This is generally optimal for libraries written for GPUs.

Databricks supports the P2 instance type series: p2.xlarge, p2.8xlarge, and p2.16xlarge and the P3 instance type series: p3.2xlarge, p3.8xlarge, and p3.16xlarge. See Supported Instance Types for a list of supported GPU instance types and their attributes. For these instance types, you should keep the following in mind:

  • P2 and P3 instances are available only in select AWS regions. For information, see Amazon EC2 Pricing. Your Databricks deployment must reside in a supported region to launch GPU-enabled clusters.
  • Due to Amazon spot instance price surges, GPU spot instances are difficult to retain. Use on-demand if needed.
  • The default on-demand limit for P2 instances is one. You might need to request a limit increase in order to create GPU-enabled clusters.
  • Amazon EC2 P2 instance types require EBS volumes for storage.

NVIDIA GPU driver, CUDA, and cuDNN

Databricks installs the NVIDIA software required to use GPUs on Spark driver and worker instances. This software includes:

  • Tesla driver for Linux x64.
  • CUDA Toolkit, installed under /usr/local/cuda.
  • cuDNN: NVIDIA CUDA Deep Neural Network Library.

For the versions of the software included, see the release notes for the Databricks Runtime version you are using.

Note

This software contains source code provided by NVIDIA Corporation. Specifically, to support GPUs, Databricks includes code from CUDA Samples.

NVIDIA End User License Agreement (EULA)

When you select a GPU-enabled “Databricks Runtime Version” in Databricks, you implicitly agree to the NVIDIA EULA.