This article provides an example of doing featurization for transfer learning using pandas UDFs.
Databricks supports featurization with deep learning models. Pre-trained deep learning models can be used to compute features for use in other downstream models. Databricks supports featurization at scale, distributing the computation across a cluster. You can perform featurization with deep learning libraries included in Databricks Runtime ML, including TensorFlow and PyTorch.
Databricks also supports transfer learning, a technique closely related to featurization. Transfer learning allows you to reuse knowledge from one problem domain in a related domain. Featurization is itself a simple and powerful method for transfer learning: computing features using a pre-trained deep learning model transfers knowledge about good features from the original domain.
This article demonstrates how to compute features for transfer learning using a pre-trained TensorFlow model, using the following workflow:
Start with a pre-trained deep learning model, in this case an image classification model from
Truncate the last layer(s) of the model. The modified model produces a tensor of features as output, rather than a prediction.
Apply that model to a new image dataset from a different problem domain, computing features for the images.
Use these features to train a new model. The following notebook omits this final step. For examples of training a simple model such as logistic regression, see Model training examples.