Skip to main content

Route optimization on serving endpoints

This article describes how to enable route optimization on your model serving endpoints. Route optimized serving endpoints dramatically lower overhead latency and allow for substantial improvements in the throughput supported by your endpoint.

Route-optimized endpoints are queried differently from non-route-optimized endpoints, including using a different URL and authentication using OAuth tokens. See Query route-optimized serving endpoints for details.

What is route optimization?

When you enable route optimization on an endpoint, Databricks Model Serving improves the network path for inference requests, resulting in faster, more direct communication between your client and the model. This optimized routing unlocks higher queries per second (QPS) compared to non-optimized endpoints and provides more stable and lower latencies for your applications.

Requirements

Enable route optimization on a model serving endpoint

You can enable route optimization when you create a model serving endpoint using the Serving UI. You can only enable route optimization during endpoint creation, you can not update existing endpoints to be route optimized.

  1. In the sidebar, click Serving to display the Serving UI.
  2. Click Create serving endpoint.
  3. In the Route optimization section, select Enable route optimization.
  4. After your endpoint is created, Databricks sends you a notification about what is needed to query a route optimized endpoint.

Create a model serving endpoint

Limitations

  • Route optimization is only available for custom model serving endpoints. Serving endpoints that use Foundation Model APIs or external models are not supported.
  • Databricks in-house OAuth tokens are the only supported authentication for route optimization. Personal access tokens are not supported.

Additional resources