Migrate to Model Serving

This article demonstrates how to enable Model Serving in your workspace and switch your models to the Mosaic AI Model Serving experience built on serverless compute.

important

Starting August 22, 2025, customers will no longer be able to create new serving endpoints using the Legacy MLflow Model Serving experience. On September 15, 2025, the legacy experience will reach end of life and all existing endpoints using this service can no longer be used.

Requirements

Registered model in the MLflow Model Registry.
Permissions on the registered models as described in the access control guide.
Enable serverless compute on your workspace.

Significant changes

In Model Serving, the format of the request to the endpoint and the response from the endpoint are slightly different from Legacy MLflow Model Serving. See Scoring a model endpoint for details on the new format protocol.
In Model Serving, the endpoint URL includes serving-endpoints instead of model.
Model Serving includes full support for managing resources with API workflows.
Model Serving is production-ready and backed by the Databricks SLA.

Identify serving endpoints that use Legacy MLflow Model Serving

To identify model serving endpoints that use Legacy MLflow Model Serving:

Navigate to the Models UI in your workspace.
Select the Workspace Model Registry filter.
Select the Legacy serving enabled only filter.

Migrate Legacy MLflow Model Serving served models to Model Serving

You can create a Model Serving endpoint and flexibly transition model serving workflows without disabling Legacy MLflow Model Serving.

The following steps show how to accomplish this with the UI. For each model on which you have Legacy MLflow Model Serving enabled:

Register your model to Unity Catalog.
Navigate to Serving endpoints on the sidebar of your machine learning workspace.
Follow the workflow described in Create custom model serving endpoints on how to create a serving endpoint with your model.
Transition your application to use the new URL provided by the serving endpoint to query the model, along with the new scoring format.
When your models are transitioned over, you can navigate to Models on the sidebar of your machine learning workspace.
Select the model for which you want to disable Legacy MLflow Model Serving.
On the Serving tab, select Stop.
A message appears to confirm. Select Stop Serving.

Migrate deployed model versions to Model Serving

In previous versions of the Model Serving functionality, the serving endpoint was created based on the stage of the registered model version: Staging or Production. To migrate your served models from that experience, you can replicate that behavior in the new Model Serving experience.

This section demonstrates how to create separate model serving endpoints for Staging model versions and Production model versions. The following steps show how to accomplish this with the serving endpoints API for each of your served models.

In the example, the registered model name modelA has version 1 in the model stage Production and version 2 in the model stage Staging.

Create two endpoints for your registered model, one for Staging model versions and another for Production model versions.

For Staging model versions:

Bash
POST /api/2.0/serving-endpoints
  {
     "name":"modelA-Staging"
     "config":
     {
        "served_entities":
        [
           {
              "entity_name":"model-A",
              "entity_version":"2",  // Staging Model Version
              "workload_size":"Small",
              "scale_to_zero_enabled":true
           },
        ],
     },
  }

For Production model versions:

Bash
POST /api/2.0/serving-endpoints
  {
     "name":"modelA-Production"
     "config":
     {
        "served_entities":
        [
           {
              "entity_name":"model-A",
              "entity_version":"1",   // Production Model Version
              "workload_size":"Small",
              "scale_to_zero_enabled":true
           },
        ],
     },
  }

Verify the status of the endpoints.

For Staging endpoint: GET /api/2.0/serving-endpoints/modelA-Staging

For Production endpoint: GET /api/2.0/serving-endpoints/modelA-Production
Once the endpoints are ready, query the endpoint using:

For Staging endpoint: POST /serving-endpoints/modelA-Staging/invocations

For Production endpoint: POST /serving-endpoints/modelA-Production/invocations

Update the endpoint based on model version transitions.

In the scenario where a new model version 3 is created, you can have the model version 2 transition to Production, while model version 3 can transition to Staging and model version 1 is Archived. These changes can be reflected in separate model serving endpoints as follows:

For the Staging endpoint, update the endpoint to use the new model version in Staging.

Bash
PUT /api/2.0/serving-endpoints/modelA-Staging/config
{
   "served_entities":
   [
      {
         "entity_name":"model-A",
         "entity_version":"3",  // New Staging model version
         "workload_size":"Small",
         "scale_to_zero_enabled":true
      },
   ],
}

For Production endpoint, update the endpoint to use the new model version in Production.

Bash
PUT /api/2.0/serving-endpoints/modelA-Production/config
{
   "served_entities":
   [
      {
         "entity_name":"model-A",
         "entity_version":"2",  // New Production model version
         "workload_size":"Small",
         "scale_to_zero_enabled":true
      },
   ],
}

Additional resources

Create Model Serving endpoints

Requirements​

Significant changes​

Identify serving endpoints that use Legacy MLflow Model Serving​

Migrate Legacy MLflow Model Serving served models to Model Serving​

Migrate deployed model versions to Model Serving​

Additional resources​