Preprocess data

On large datasets, you can use Spark SQL and MLlib for feature engineering. Third-party libraries included in Databricks Runtime ML such as scikit-learn also provide useful helper methods. For examples, see the following machine learning notebooks for scikit-learn and MLlib:

For more complex deep learning feature processing, this example notebook illustrates how to use transfer learning for featurization: