Serve models with Databricks

In this section, you learn how to use Mosaic AI Model Serving to serve AI and ML models through REST endpoints, as well as how to use MLflow for batch and streaming inference.

Mosaic AI Model Serving

Mosaic AI Model Serving is Databricks’ unified interface to deploy, govern, and query AI and ML models. Each model you serve is available as a REST API that you can integrate into your web or client application.

Mosaic AI Model Serving supports serving the following:

  • Custom Python models packaged in the MLflow format. Examples include scikit-learn, XGBoost, PyTorch, and Hugging Face transformer models.

  • Foundation Model APIs like DBRX, Meta Llama, and Mixtral.

  • External models hosted outside of Databricks can be centrally governed from Databricks. This streamlines the use and management of various LLM providers, such as OpenAI and Anthropic, within your organization.

Batch inference

For batch and streaming inference, Databricks recommends that you use MLflow to deploy MLflow models. For more information, see Deploy models for batch inference and prediction.