def preprocess(content): """ Preprocesses raw image bytes for prediction. """ img = Image.open(io.BytesIO(content)).resize([224, 224]) arr = img_to_array(img) return preprocess_input(arr) def featurize_series(model, content_series): """ Featurize a pd.Series of raw images using the input model. :return: a pd.Series of image features """ input = np.stack(content_series.map(preprocess)) preds = model.predict(input) # For some layers, output features will be multi-dimensional tensors. # We flatten the feature tensors to vectors for easier storage in Spark DataFrames. output = [p.flatten() for p in preds] return pd.Series(output)
'array<float>', PandasUDFType.SCALAR_ITER) def featurize_udf(content_series_iter): ''' This method is a Scalar Iterator pandas UDF wrapping our featurization function. The decorator specifies that this returns a Spark DataFrame column of type ArrayType(FloatType). :param content_series_iter: This argument is an iterator over batches of data, where each batch is a pandas Series of image data. ''' # With Scalar Iterator pandas UDFs, we can load the model once and then re-use it # for multiple data batches. This amortizes the overhead of loading big models. model = model_fn() for content_series in content_series_iter: yield featurize_series(model, content_series)(
# We can now run featurization on our entire Spark DataFrame. # NOTE: This can take a long time (about 10 minutes) since it applies a large model to the full dataset. features_df = images.repartition(16).select(col("path"), featurize_udf("content").alias("features")) features_df.write.mode("overwrite").parquet("dbfs:/ml/tmp/flower_photos_features")
Featurization using a pretrained model for transfer learning
This notebook demonstrates how to take a pre-trained deep learning model and use it to compute features for downstream models. This is sometimes called transfer learning since it allows transfering knowledge (i.e., the feature encoding) from the pre-trained model to a new model.
In this notebook:
This notebook does not take the final step of using those features to train a new model. For examples of training a simple model such as logistic regression, refer to the "Machine Learning" examples in the Databricks documentation.
Requirements:
Last refresh: Never