Deep Learning Pipelines migration guide
This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See <new_article>.
This page includes tips for migrating from the open source Deep Learning Pipelines package that was included in Databricks Runtime 6.6 ML and below. Parts of the Deep Learning Pipelines library
sparkdl were removed in Databricks Runtime 7.0 ML (Unsupported), specifically, the Transformers and Estimators used in Apache Spark ML pipelines.
This page is not a resource for general information about deep learning pipelines on Databricks.
The Deep Learning Pipelines package includes an image reader
sparkdl.image.imageIO, which was removed in Databricks Runtime 7.0 ML (Unsupported).
Instead, use the image data source or binary file data source from Apache Spark. Many of the example notebooks in Load data for machine learning and deep learning show use cases of these two data sources.
The Deep Learning Pipelines package includes a Spark ML Transformer
sparkdl.DeepImageFeaturizer for facilitating transfer learning with deep learning models.
DeepImageFeaturizer was removed in Databricks Runtime 7.0 ML (Unsupported).
Instead, use pandas UDFs to perform featurization with deep learning models. pandas UDFs, and their newer variant Scalar Iterator pandas UDFs, offer more flexible APIs, support more deep learning libraries, and provide better performance.
See Featurization for transfer learning for examples of transfer learning with pandas UDFs.
Distributed hyperparameter tuning
The Deep Learning Pipelines package includes a Spark ML Estimator
sparkdl.KerasImageFileEstimator for tuning hyperparameters using Spark ML tuning utilities.
KerasImageFileEstimator was removed in Databricks Runtime 7.0 ML (Unsupported).
Instead, use Hyperparameter tuning with Hyperopt to distribute hyperparameter tuning for deep learning models.
The Deep Learning Pipelines package includes several Spark ML Transformers for distributing inference, all of which were removed in Databricks Runtime 7.0 ML (Unsupported):
Instead, use pandas UDFs to run inference on Spark DataFrames, following the examples in Deploy models for inference and prediction.
Deploy models as SQL UDFs
The Deep Learning Pipelines package includes a utility
sparkdl.udf.keras_image_model.registerKerasImageUDF for deploying a deep learning model as a UDF callable from Spark SQL.
registerKerasImageUDF was removed in Databricks Runtime 7.0 ML (Unsupported).
Instead, use MLflow to export the model as a UDF, following the example in Model inference.