Migrate to Model Serving
This article demonstrates how to enable Model Serving on your workspace and switch your models to Mosaic AI Model Serving experience built on serverless compute.
Requirements
Registered model in the MLflow Model Registry.
Permissions on the registered models as described in the access control guide.
Significant changes
In Model Serving, the format of the request to the endpoint and the response from the endpoint are slightly different from Legacy MLflow Model Serving. See Scoring a model endpoint for details on the new format protocol.
In Model Serving, the endpoint URL includes
serving-endpoints
instead ofmodel
.Model Serving includes full support for managing resources with API workflows.
Model Serving is production-ready and backed by the Databricks SLA.
Migrate Legacy MLflow Model Serving served models to Model Serving
You can create a Model Serving endpoint and flexibly transition model serving workflows without disabling Legacy MLflow Model Serving.
The following steps show how to accomplish this with the UI. For each model on which you have Legacy MLflow Model Serving enabled:
Register your model to Unity Catalog.
Navigate to Serving endpoints on the sidebar of your machine learning workspace.
Follow the workflow described in Create custom model serving endpoints on how to create a serving endpoint with your model.
Transition your application to use the new URL provided by the serving endpoint to query the model, along with the new scoring format.
When your models are transitioned over, you can navigate to Models on the sidebar of your machine learning workspace.
Select the model for which you want to disable Legacy MLflow Model Serving.
On the Serving tab, select Stop.
A message appears to confirm. Select Stop Serving.
Migrate deployed model versions to Model Serving
In previous versions of the Model Serving functionality, the serving endpoint was created based on the stage of the registered model version: Staging
or Production
. To migrate your served models from that experience, you can replicate that behavior in the new Model Serving experience.
This section demonstrates how to create separate model serving endpoints for Staging
model versions and Production
model versions. The following steps show how to accomplish this with the serving endpoints API for each of your served models.
In the example, the registered model name modelA
has version 1 in the model stage Production
and version 2 in the model stage Staging
.
Create two endpoints for your registered model, one for
Staging
model versions and another forProduction
model versions.For
Staging
model versions:POST /api/2.0/serving-endpoints { "name":"modelA-Staging" "config": { "served_entities": [ { "entity_name":"model-A", "entity_version":"2", // Staging Model Version "workload_size":"Small", "scale_to_zero_enabled":true }, ], }, }
For
Production
model versions:POST /api/2.0/serving-endpoints { "name":"modelA-Production" "config": { "served_entities": [ { "entity_name":"model-A", "entity_version":"1", // Production Model Version "workload_size":"Small", "scale_to_zero_enabled":true }, ], }, }
Verify the status of the endpoints.
For Staging endpoint:
GET /api/2.0/serving-endpoints/modelA-Staging
For Production endpoint:
GET /api/2.0/serving-endpoints/modelA-Production
Once the endpoints are ready, query the endpoint using:
For Staging endpoint:
POST /serving-endpoints/modelA-Staging/invocations
For Production endpoint:
POST /serving-endpoints/modelA-Production/invocations
Update the endpoint based on model version transitions.
In the scenario where a new model version 3 is created, you can have the model version 2 transition to
Production
, while model version 3 can transition toStaging
and model version 1 isArchived
. These changes can be reflected in separate model serving endpoints as follows:For the
Staging
endpoint, update the endpoint to use the new model version inStaging
.PUT /api/2.0/serving-endpoints/modelA-Staging/config { "served_entities": [ { "entity_name":"model-A", "entity_version":"3", // New Staging model version "workload_size":"Small", "scale_to_zero_enabled":true }, ], }
For
Production
endpoint, update the endpoint to use the new model version inProduction
.PUT /api/2.0/serving-endpoints/modelA-Production/config { "served_entities": [ { "entity_name":"model-A", "entity_version":"2", // New Production model version "workload_size":"Small", "scale_to_zero_enabled":true }, ], }
Migrate MosaicML inference workflows to Model Serving
This section provides guidance on how to migrate your MosaicML inference deployments to Mosaic AI Model Serving and includes a notebook example.
The following table summarizes the parity between MosaicML inference and model serving on Databricks.
MosaicML Inference |
Mosaic AI Model Serving |
---|---|
create_inference_deployment |
|
update_inference_deployment |
|
delete_inference_deployment |
|
get_inference_deployment |
The following notebook provides a guided example of migrating a llama-13b
model from MosaicML to Mosaic AI Model Serving.