Model inference

This article covers some of the options for model inference on Databricks.

Use MLflow for inference

To use machine learning models inference, Databricks recommends that you use MLflow. You can use MLflow to deploy models for batch or streaming inference applications, or to set up a REST endpoint to serve the model.

Streaming inference

For streaming applications, use the Apache Spark Structured Streaming API. The Structured Streaming API is similar to that for batch operations. You can use the automatically generated notebook mentioned in the previous section as a template and modify it to use streaming instead of batch. See the Apache Spark MLlib pipelines and Structured Streaming example.

Non-MLflow options

For scalable model inference with MLlib and XGBoost4J models, use the native transform methods to perform inference directly on Spark DataFrames. The MLlib example notebooks include inference steps.

For other libraries and model types, create a Spark UDF to scale out inference on large datasets. For smaller datasets, use the native model inference routines provided by the library. For larger datasets, use pandas Iterator UDFs to wrap machine learning models.

Inference with deep learning models

For information about and examples of deep learning model inference on Databricks, see the following articles: