This article and its accompanying notebooks describe a reference solution for distributed image model inference based on a common setup shared by many real-world image applications. This setup assumes that you store many images in an object store. Suppose you have several trained deep learning (DL) models for image classification and object detection—for example, MobileNetV2 for detecting human objects in user-uploaded photos to help protect privacy—and you want to apply these DL models to the stored images.
You might re-train the models and update previously computed predictions. However, it is both I/O-heavy and compute-heavy to load many images and apply DL models. Fortunately, the inference workload is embarrassingly parallel and in theory can be distributed easily. This guide walks you through a practical solution that contains two major stages:
- ETL images into a Delta table. A dedicated ETL job helps data management and simplifies the inference task.
- Perform distributed inference using Pandas UDF.