Route optimization on serving endpoints

This article describes how to enable route optimization on your model serving or feature serving endpoints. Route optimized serving endpoints dramatically lower overhead latency and allow for substantial improvements in the throughput supported by your endpoint.

Route optimization is recommended for high throughput or latency sensitive workloads.

What is route optimization?

When you enable route optimization on an endpoint, Databricks Model Serving improves the network path for inference requests, resulting in faster, more direct communication between your client and the model. This optimized routing unlocks higher queries per second (QPS) compared to non-optimized endpoints and provides more stable and lower latencies for your applications.

To leverage the benefits of route-optimized endpoints, you must make the following changes to your client:

Use the route-optimized URL: Each route-optimized endpoint has a unique URL. You must send inference requests to this specific URL.
Authenticate using OAuth tokens: Route-optimized endpoints only support OAuth tokens for authentication. Other authentication mechanisms are not supported.

Requirements

For route optimization on a model serving endpoint, see Requirements.
For route optimization on a feature serving endpoint, see Requirements.

Enable route optimization on a model serving endpoint

Serving UI
REST API
Python

You can enable route optimization when you create a model serving endpoint using the Serving UI. You can only enable route optimization during endpoint creation, you can not update existing endpoints to be route optimized.

In the sidebar, click Serving to display the Serving UI.
Click Create serving endpoint.
In the Route optimization section, select Enable route optimization.
After your endpoint is created, Databricks sends you a notification about what is needed to query a route optimized endpoint.

Create a model serving endpoint

To configure your serving endpoint for route optimization using the REST API, specify the route_optimized parameter during model serving endpoint creation. You can only specify this parameter during endpoint creation, you can not update existing endpoints to be route optimized.

Bash
POST /api/2.0/serving-endpoints

{
  "name": "my-endpoint",
  "config":
  {
    "served_entities":
    [{
      "entity_name": "ads1",
      "entity_version": "1",
      "workload_type": "CPU",
      "workload_size": "Small",
      "scale_to_zero_enabled": true,
    }],
  },
  "route_optimized": true
}

Enable route optimization on a feature serving endpoint

To use route optimization for Feature and Function Serving, specify the full name of the feature specification in the entity_name field for serving endpoint creation requests. The entity_version is not needed for FeatureSpecs.

Bash

POST /api/2.0/serving-endpoints

{
  "name": "my-endpoint",
  "config":
  {
    "served_entities":
    [
      {
        "entity_name": "catalog_name.schema_name.feature_spec_name",
        "workload_type": "CPU",
        "workload_size": "Small",
        "scale_to_zero_enabled": true
      }
    ]
  },
  "route_optimized": true
}

Limitations

Route optimization is only available for custom model serving endpoints and feature serving endpoints. Serving endpoints that use Foundation Model APIs or external models are not supported.
Databricks in-house OAuth tokens are the only supported authentication for route optimization. Personal access tokens are not supported.

Additional resources

Query route optimized serving endpoints

What is route optimization?​

Requirements​

Enable route optimization on a model serving endpoint​

Create a route optimized serving endpoint using Python notebook

Enable route optimization on a feature serving endpoint​

Limitations​

Additional resources​