Deep Learning Pipelines

Deep Learning Pipelines is a high-level deep learning framework that facilitates common deep learning workflows via the Spark MLlib Pipelines API and scales out deep learning on big data using Spark. It is an open source project and employs the Apache 2.0 License. For details about the library, refer to the Deep Learning Pipelines GitHub page.

Deep Learning Pipelines is a high-level API that calls into lower-level deep learning libraries. It currently supports TensorFlow and Keras with the TensorFlow-backend.

For working in Spark with workflows or deep learning frameworks not currently supported by Deep Learning Pipelines, see Integrating Deep Learning Libraries with Apache Spark for an example of integrating a deep learning library with Spark.

In this notebook we provide guidance on installing Deep Learning Pipelines on Databricks and give examples of deep learning workflows that it supports.


The Deep Learning Pipelines library is included in Databricks Runtime ML (Beta), a machine learning runtime that provides a ready-to-go environment for machine learning and data science. Instead of installing Deep Learning Pipelines using the instructions in the “Cluster setup” section of the notebook below, you can simply create a cluster using Databricks Runtime ML. See Overview of Databricks Runtime for Machine Learning.

This notebook requires Databricks Runtime 4.0 or above.

Deep Learning Pipelines notebook