Skip to main content

Route optimization on serving endpoints

This article describes how to enable route optimization on your model serving or feature serving endpoints. Route optimized serving endpoints dramatically lower overhead latency and allow for substantial improvements in the throughput supported by your endpoint.

Route optimization is recommended for high throughput or latency sensitive workloads.

What is Route Optimization?

When you enable route optimization on an endpoint, Databricks Model Serving improves the network path for inference requests, resulting in faster, more direct communication between your client and the model. This optimized routing unlocks higher queries per second (QPS) compared to non-optimized endpoints and provides more stable and lower latencies for your applications.

To leverage the benefits of route-optimized endpoints, you must make the following changes to your client:

  • Use the route-optimized URL: Each route-optimized endpoint has a unique URL. You must send inference requests to this specific URL.
  • Authenticate using OAuth tokens: Route-optimized endpoints only support OAuth tokens for authentication. Other authentication mechanisms are not supported.

Requirements

  • For route optimization on a model serving endpoint, see Requirements.
  • For route optimization on a feature serving endpoint, see Requirements.

Enable route optimization on a model serving endpoint

You can enable route optimization when you create a model serving endpoint using the Serving UI. You can only enable route optimization during endpoint creation, you can not update existing endpoints to be route optimized.

  1. In the sidebar, click Serving to display the Serving UI.
  2. Click Create serving endpoint.
  3. In the Route optimization section, select Enable route optimization.
  4. After your endpoint is created, Databricks sends you a notification about what is needed to query a route optimized endpoint.

Create a model serving endpoint

Enable route optimization on a feature serving endpoint

To use route optimization for Feature and Function Serving, specify the full name of the feature specification in the entity_name field for serving endpoint creation requests. The entity_version is not needed for FeatureSpecs.

Bash

POST /api/2.0/serving-endpoints

{
"name": "my-endpoint",
"config":
{
"served_entities":
[
{
"entity_name": "catalog_name.schema_name.feature_spec_name",
"workload_type": "CPU",
"workload_size": "Small",
"scale_to_zero_enabled": true
}
]
},
"route_optimized": true
}

Limitations

  • Route optimization is only available for custom model serving endpoints and feature serving endpoints. Foundation Model APIs and External Models are not supported.
  • Databricks in-house OAuth tokens are the only supported authentication for route optimization. Personal access tokens are not supported.

Additional resources

Was this article helpful?