Integrating Deep Learning Libraries with Apache Spark

Using a deep learning library with Spark is similar to using any other third-party library within Spark tasks.


Some deep learning libraries are installed from source or in non-standard ways. Please check the installation information in the Deep Learning pages for the deep learning libraries you are interested in.

Deep Learning in Spark Jobs

Deep learning libraries are best used within Spark jobs for tasks such as:

  • Distributed inference: Each worker task makes predictions for a subset of instances.
  • Distributed model selection: Each worker task fits a model using a different set of training parameters. Each worker task uses a local dataset.

The following notebook demonstrates distributed inference using TensorFlow. Other deep learning libraries may use similar workflows.