dist-img-infer-3-keras-udf(Python)

Import Notebook

Do distributed model inference with TensorFlow tf.keras and Delta

Start from the Delta table /databricks-datasets/flowers/, which is a copy of the output table of the ETL image dataset in a Delta table notebook.
Use scalar iterator Pandas UDF to make predictions

Define a Pandas UDF for the inference task

There are three UDFs in PySpark that provides 1:1 mapping semantic:

PySpark UDF: record -> record, performance issues in data serialization, not recommended
Scalar Pandas UDF: pandas Series/DataFrame -> pandas Series/DataFrame, no shared states among batches
Scalar iterator Pandas UDF: initialize some state first, then go through batches.

Databricks recommends scalar iterator Pandas UDF for model inference.

Do distributed inference in DataFrames API

You declare a predictions column and how to compute each.
Let Spark optimize the execution
To automatically apply inference to new data in the Delta table, use spark.readStream to load the Delta table as a stream source, and write the predictions to another Delta table.

You can do more (optional)

Filter images to predict based on some metadata.

Do inference directly in SQL.