Databricks recommends that you use MLflow to deploy machine learning models. You can use MLflow to deploy models for batch or streaming inference or to set up a REST endpoint to serve the model.
This article describes how to deploy MLflow models for offline (batch and streaming) inference and online (real-time) serving. For general information about working with MLflow models, see Log, load, register, and deploy MLflow Models.
You can simplify model deployment by registering models to the MLflow Model Registry. After you have registered your model, you can automatically generate a notebook for batch inference or configure the model for online serving.
This section includes instructions and examples for setting up batch predictions on Databricks.
MLflow helps you generate code for batch or streaming inference.
In the MLflow Model Registry, you can automatically generate a notebook.
In the MLflow Run page for your model, you can copy the generated code snippet for inference on pandas or Apache Spark DataFrames.
You can also customize the code generated by either of the above options. See the following notebooks for examples:
The model inference example uses a model trained with scikit-learn and previously logged to MLflow to show how to load a model and use it to make predictions on data in different formats. The notebook illustrates how to apply the model as a scikit-learn model to a pandas DataFrame, and how to apply the model as a PySpark UDF to a Spark DataFrame.
The MLflow Model Registry example shows how to build, manage, and deploy a model with Model Registry. On that page, you can search for
.predictto identify examples of offline (batch) predictions.
To run batch or streaming predictions as a job, create a notebook or JAR that includes the code used to perform the predictions. Then, execute the notebook or JAR as a Databricks job. Jobs can be run either immediately or on a schedule.
For streaming applications, use the Apache Spark Structured Streaming API. The Structured Streaming API is similar to that for batch operations. You can use the automatically generated notebook mentioned in the previous section as a template and modify it to use streaming instead of batch. See the Apache Spark MLlib pipelines and Structured Streaming example.
For information about and examples of deep learning model inference on Databricks, see the following articles:
For scalable model inference with MLlib and XGBoost4J models, use the native
transform methods to perform inference directly on Spark DataFrames. The MLlib example notebooks include inference steps.
When you use the MLflow APIs to run inference on Spark DataFrames, you can load the model as a Spark UDF and apply it at scale using distributed computing.
You can customize your model to add pre-processing or post-processing and to optimize computational performance for large models. A good option for customizing models is the MLflow pyfunc API, which allows you to wrap a model with custom logic.
For smaller datasets, you can also use the native model inference routines provided by the library.
For Python MLflow models, Databricks provides MLflow Model Serving, which allows you to host machine learning models from the Model Registry as REST endpoints that are updated automatically based on the availability of model versions and their stages.
To deploy a model to third-party serving frameworks, use
mlflow.<deploy-type>.deploy(). For examples, see Deploy models for online serving.