Deep Learning Pipelines
Note
This page describes the open source Deep Learning Pipelines package included in Databricks Runtime 6.6 ML and below. This page is not intended as a resource for general information about deep learning pipelines on Databricks.
The Deep Learning Pipelines package is a high-level deep learning framework that facilitates common deep learning workflows via the Apache Spark MLlib Pipelines API and scales out deep learning on big data using Spark. It is an open source project employing the Apache License 2.0.
The Deep Learning Pipelines package calls into lower-level deep learning libraries. It supports TensorFlow and Keras with the TensorFlow backend.
Migration guide to Databricks Runtime 7.0 ML and above
Important
Parts of the Deep Learning Pipelines library sparkdl
have been removed in Databricks Runtime 7.0 ML (Unsupported), specifically, the Transformers and Estimators used in Apache Spark ML pipelines. See the following sections for migration tips and workarounds.
Read images
The Deep Learning Pipelines package includes an image reader sparkdl.image.imageIO
, which was removed in Databricks Runtime 7.0 ML (Unsupported).
Instead, use the image data source or binary file data source from Apache Spark. Many of the example notebooks in Load data show use cases of these two data sources.
Transfer learning
The Deep Learning Pipelines package includes a Spark ML Transformer sparkdl.DeepImageFeaturizer
for facilitating transfer learning with deep learning models. DeepImageFeaturizer
was removed in Databricks Runtime 7.0 ML (Unsupported).
Instead, use pandas UDFs to perform featurization with deep learning models. pandas UDFs, and their newer variant Scalar Iterator pandas UDFs, offer more flexible APIs, support more deep learning libraries, and give higher performance.
See Featurization for transfer learning for examples of transfer learning with pandas UDFs.
Distributed hyperparameter tuning
The Deep Learning Pipelines package includes a Spark ML Estimator sparkdl.KerasImageFileEstimator
for tuning hyperparameters using Spark ML tuning utilities. KerasImageFileEstimator
was removed in Databricks Runtime 7.0 ML (Unsupported).
Instead, use Hyperparameter tuning with Hyperopt to distribute hyperparameter tuning for deep learning models.
Distributed inference
The Deep Learning Pipelines package includes several Spark ML Transformers for distributing inference, all of which are removed in Databricks Runtime 7.0 ML (Unsupported):
DeepImagePredictor
TFImageTransformer
KerasImageFileTransformer
TFTransformer
KerasTransformer
Instead, use pandas UDFs to run inference on Spark DataFrames, following the examples in Deploy models for inference and prediction.
Deploy models as SQL UDFs
The Deep Learning Pipelines package includes a utility sparkdl.udf.keras_image_model.registerKerasImageUDF
for deploying a deep learning model as a UDF callable from Spark SQL. registerKerasImageUDF
was removed in Databricks Runtime 7.0 ML (Unsupported).
Instead, use MLflow to export the model as a UDF, following the example in Model inference.