Manage model serving endpoints

This article describes how to manage model serving endpoints using the Serving UI and REST API. See Serving endpoints in the REST API reference.

To create model serving endpoints use one of the following:

Get the status of the model endpoint

In the Serving UI, you can check the status of an endpoint from the Serving endpoint state indicator at the top of your endpoint's details page.

Check the status and details of an endpoint programmatically using the REST API or the MLflow Deployments SDK:

REST API
MLflow Deployments SDK

Bash
GET /api/2.0/serving-endpoints/{name}

The following example creates an endpoint that serves the first version of the my-ads-model model that is registered in the Unity Catalog model registry. You must provide the full model name including parent catalog and schema such as, catalog.schema.example-model.

In the following example response, the state.ready field is “READY”, which means the endpoint is ready to receive traffic. The state.update_state field is NOT_UPDATING and pending_config is no longer returned because the update was finished successfully.

JSON
{
  "name": "unity-model-endpoint",
  "creator": "customer@example.com",
  "creation_timestamp": 1666829055000,
  "last_updated_timestamp": 1666829055000,
  "state": {
    "ready": "READY",
    "update_state": "NOT_UPDATING"
  },
  "config": {
    "served_entities": [
      {
        "name": "my-ads-model",
        "entity_name": "myCatalog.mySchema.my-ads-model",
        "entity_version": "1",
        "workload_size": "Small",
        "scale_to_zero_enabled": false,
        "state": {
          "deployment": "DEPLOYMENT_READY",
          "deployment_state_message": ""
        },
        "creator": "customer@example.com",
        "creation_timestamp": 1666829055000
      }
    ],
    "traffic_config": {
      "routes": [
        {
          "served_model_name": "my-ads-model",
          "traffic_percentage": 100
        }
      ]
    },
    "config_version": 1
  },
  "id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "permission_level": "CAN_MANAGE"
}

Python
from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
endpoint = client.get_endpoint(endpoint="chat")
assert endpoint == {
    "name": "chat",
    "creator": "alice@company.com",
    "creation_timestamp": 0,
    "last_updated_timestamp": 0,
    "state": {...},
    "config": {...},
    "tags": [...],
    "id": "88fd3f75a0d24b0380ddc40484d7a31b",
}

Stop a model serving endpoint

You can temporarily stop a model serving endpoint and start it later. When an endpoint is stopped, the resources provisioned for it are shut down, and the endpoint is not able to serve queries until it is started again. Only endpoints that serve custom models, are not route-optimized, and have no in-progress updates can be stopped. Stopped endpoints do not count against the resource quota. Queries sent to a stopped endpoint return a 400 error.

You can stop an endpoint from the endpoint's details page in the Serving UI.

Click the endpoint you want to stop.
Click Stop in the upper-right corner.

Alternatively, you can stop a serving endpoint programmatically using the REST API as follows:

Bash
POST /api/2.0/serving-endpoints/{name}/config:stop

When you are ready to start a stopped model serving endpoint, you can do so from the endpoint's details page in the Serving UI.

Click the endpoint you want to start.
Click Start in the upper-right corner.

Alternatively, you can start a stopped serving endpoint programmatically using the REST API as follows:

Bash
POST /api/2.0/serving-endpoints/{name}/config:start

Delete a model serving endpoint

To disable serving for a model, you can delete the endpoint it's served on.

You can delete an endpoint from the endpoint's details page in the Serving UI.

Click Serving on the sidebar.
Click the endpoint you want to delete.
Click the kebab menu at the top and select Delete.

Alternatively, you can delete a serving endpoint programmatically using the REST API or the MLflow Deployments SDK

REST API
MLflow Deployments SDK

Bash
DELETE /api/2.0/serving-endpoints/{name}

Python
from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
client.delete_endpoint(endpoint="chat")

Debug your model serving endpoint

To debug any issues with the endpoint, you can fetch:

Model server container build logs
Model server logs

These logs are also accessible from the Endpoints UI in the Logs tab.

For the build logs for a served model you can use the following request. See Debugging guide for Model Serving for more information.

Bash

GET /api/2.0/serving-endpoints/{name}/served-models/{served-model-name}/build-logs
{
  “config_version”: 1  // optional
}

For the model server logs for a serve model, you can use the following request:

Bash

GET /api/2.0/serving-endpoints/{name}/served-models/{served-model-name}/logs

{
  “config_version”: 1  // optional
}

Manage permissions on your model serving endpoint

You must have at least the CAN MANAGE permission on a serving endpoint to modify permissions. For more information on the permission levels, see Serving endpoint ACLs.

Get the list of permissions on the serving endpoint.

Bash
databricks permissions get servingendpoints <endpoint-id>

Grant user jsmith@example.com the CAN QUERY permission on the serving endpoint.

Bash
databricks permissions update servingendpoints <endpoint-id> --json '{
  "access_control_list": [
    {
      "user_name": "jsmith@example.com",
      "permission_level": "CAN_QUERY"
    }
  ]
}'

You can also modify serving endpoint permissions using the Permissions API.

Add a serverless budget policy for a model serving endpoint

Preview

This feature is in Public Preview and is not available for serving endpoints that serve External models.

Serverless budget policies allow your organization to apply custom tags on serverless usage for granular billing attribution. If your workspace uses serverless budget policies to attribute serverless usage, you can add a serverless budget policy to your model serving endpoints. See Attribute usage with serverless budget policies.

During model serving endpoint creation, you can select your endpoint's serverless budget policy from the Budget policy menu in the Serving UI. If you have a serverless budget policy assigned to you, all endpoints that you create are assigned that serverless budget policy, even if you do not select a policy from the Budget policy menu.

Add serverless budget policy during model serving endpoint creation using the Serving UI.

If you have MANAGE permissions for an existing endpoint, you can edit and add a serverless budget policy to that endpoint from the Endpoint details page in the UI.

Edit serverless budget policy on an existing model serving endpoint using the Serving UI.

note

If you've been assigned a serverless budget policy, your existing endpoints are not automatically tagged with your policy. You must manually update existing endpoints if you want to attach a serverless budget policy to them.

Get a model serving endpoint schema

Preview

Support for serving endpoint query schemas is in Public Preview. This functionality is available in Model Serving regions.

A serving endpoint query schema is a formal description of the serving endpoint using the standard OpenAPI specification in JSON format. It contains information about the endpoint including the endpoint path, details for querying the endpoint like the request and response body format, and data type for each field. This information can be helpful for reproducibility scenarios or when you need information about the endpoint, but you are not the original endpoint creator or owner.

To get the model serving endpoint schema, the served model must have a model signature logged and the endpoint must be in a READY state.

The following examples demonstrate how to programmatically get the model serving endpoint schema using the REST API. For feature serving endpoint schemas, see What is Databricks Feature Serving?.

The schema returned by the API is in the format of a JSON object that follows the OpenAPI specification.

Bash

ACCESS_TOKEN="<endpoint-token>"
ENDPOINT_NAME="<endpoint name>"

curl "https://example.databricks.com/api/2.0/serving-endpoints/$ENDPOINT_NAME/openapi" -H "Authorization: Bearer $ACCESS_TOKEN" -H "Content-Type: application/json"

Schema response details

The response is an OpenAPI specification in JSON format, typically including fields like openapi, info, servers and paths. Since the schema response is a JSON object, you can parse it using common programming languages, and generate client code from the specification using third-party tools. You can also visualize the OpenAPI specification using third-party tools like Swagger Editor.

The main fields of the response include:

The info.title field shows the name of the serving endpoint.
The servers field always contains one object, typically the url field which is the base url of the endpoint.
The paths object in the response contains all supported paths for an endpoint. The keys in the object are the path URL. Each path can support multiple formats of inputs. These inputs are listed in the oneOf field.

The following is an example endpoint schema response:

JSON
{
  "openapi": "3.1.0",
  "info": {
    "title": "example-endpoint",
    "version": "2"
  },
  "servers": [{ "url": "https://example.databricks.com/serving-endpoints/example-endpoint" }],
  "paths": {
    "/served-models/vanilla_simple_model-2/invocations": {
      "post": {
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "oneOf": [
                  {
                    "type": "object",
                    "properties": {
                      "dataframe_split": {
                        "type": "object",
                        "properties": {
                          "columns": {
                            "description": "required fields: int_col",
                            "type": "array",
                            "items": {
                              "type": "string",
                              "enum": ["int_col", "float_col", "string_col"]
                            }
                          },
                          "data": {
                            "type": "array",
                            "items": {
                              "type": "array",
                              "prefixItems": [
                                {
                                  "type": "integer",
                                  "format": "int64"
                                },
                                {
                                  "type": "number",
                                  "format": "double"
                                },
                                {
                                  "type": "string"
                                }
                              ]
                            }
                          }
                        }
                      },
                      "params": {
                        "type": "object",
                        "properties": {
                          "sentiment": {
                            "type": "number",
                            "format": "double",
                            "default": "0.5"
                          }
                        }
                      }
                    },
                    "examples": [
                      {
                        "columns": ["int_col", "float_col", "string_col"],
                        "data": [
                          [3, 10.4, "abc"],
                          [2, 20.4, "xyz"]
                        ]
                      }
                    ]
                  },
                  {
                    "type": "object",
                    "properties": {
                      "dataframe_records": {
                        "type": "array",
                        "items": {
                          "required": ["int_col", "float_col", "string_col"],
                          "type": "object",
                          "properties": {
                            "int_col": {
                              "type": "integer",
                              "format": "int64"
                            },
                            "float_col": {
                              "type": "number",
                              "format": "double"
                            },
                            "string_col": {
                              "type": "string"
                            },
                            "becx_col": {
                              "type": "object",
                              "format": "unknown"
                            }
                          }
                        }
                      },
                      "params": {
                        "type": "object",
                        "properties": {
                          "sentiment": {
                            "type": "number",
                            "format": "double",
                            "default": "0.5"
                          }
                        }
                      }
                    }
                  }
                ]
              }
            }
          }
        },
        "responses": {
          "200": {
            "description": "Successful operation",
            "content": {
              "application/json": {
                "schema": {
                  "type": "object",
                  "properties": {
                    "predictions": {
                      "type": "array",
                      "items": {
                        "type": "number",
                        "format": "double"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Get the status of the model endpoint​

Stop a model serving endpoint​

Delete a model serving endpoint​

Debug your model serving endpoint​

Manage permissions on your model serving endpoint​

Add a serverless budget policy for a model serving endpoint​

Get a model serving endpoint schema​

Schema response details​