Deploy models

This article covers how to deploy MLflow models for offline (batch) and online (real-time) serving. For general information about working with MLflow models, see Log, load, register, and deploy MLflow Models.

For models registered in Model Registry, you can automatically generate a notebook for batch inference or configure the model for online serving.

Offline (batch) predictions

For information about using Model Registry to build, manage, and deploy a model, see the MLflow Model Registry example. On that page, you can search for .predict to identify examples of offline (batch) predictions.

To run batch or offline predictions, create a notebook or JAR that includes the code to perform the predictions. Then, execute the notebook or JAR as a Databricks job. Jobs can be run either immediately or on a schedule.

Online (real-time) model serving with MLflow

For Python MLflow models, Databricks provides MLflow Model Serving, which allows you to host machine learning models from the Model Registry as REST endpoints that are updated automatically based on the availability of model versions and their stages.

For information about other options for deploying models with MLflow and using them for online (real-time) model serving, see Deploy models for online serving.