Deep Learning Pipelines

Deep Learning Pipelines is a high-level deep learning framework that facilitates common deep learning workflows via the Spark MLlib Pipelines API and scales out deep learning on big data using Spark. It is an open source project and employs the Apache 2.0 License.

Deep Learning Pipelines is a high-level API that calls into lower-level deep learning libraries. It currently supports TensorFlow and Keras with the TensorFlow-backend.

In the sections below, we provide guidance on installing Deep Learning Pipelines on Databricks and give examples of deep learning workflows supported by it. For working in Spark with workflows or deep learning frameworks not currently supported by Deep Learning Pipelines, see Integrating Deep Learning Libraries with Apache Spark for an example of integrating a deep learning library with Spark.


This guide is not a comprehensive guide on Deep Learning Pipelines. Please also refer to the Deep Learning Pipelines github page.

Install Deep Learning Pipelines

Deep Learning Pipelines may be installed as a regular Databricks Library from Spark Packages. Use the spark-deep-learning Spark Package. See Libraries for more info on Databricks Libraries.

To run Deep Learning Pipelines, additional libraries need to be installed as Databricks Libraries: tensorflow, keras, and h5py should be installed via PyPI, and tensorframes can be installed via Spark Packages.

Use Deep Learning Pipelines

Deep Learning Pipelines can be used on either CPU or GPU clusters.