Configure traffic splitting for Unity AI Gateway endpoints
This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.
This page describes how to configure traffic splitting for Unity AI Gateway endpoints. Use traffic splitting to distribute requests across multiple model backends behind a single Unity AI Gateway endpoint, so you can gradually roll out new models, run A/B tests, and spread load across providers.
Requirements
- Unity AI Gateway preview enabled for your account. See Manage Databricks previews.
- A Databricks workspace in an Unity AI Gateway supported region.
Configure traffic splitting in the UI
- In your Databricks workspace, click AI Gateway in the sidebar and select the endpoint you want to edit.
- In the Destinations section, click Add another model to add a destination entry for each model backend you want to include in the split.
- For each destination, set Traffic percentage to the share of traffic you want that model to receive.
- Percentages must sum to 100%.
- The system saves changes automatically when all allocations sum to 100%.
Unity AI Gateway randomly routes each request across the configured destinations according to the traffic percentages you specify. Over time, the observed share of traffic for each destination converges to the configured percentages.
Interaction with fallbacks
You can use traffic splitting and fallbacks together, but they apply at different stages of request handling:
- Traffic splitting determines the initial (primary) destination for a request.
- Fallbacks define how the system retries the request if the primary attempt fails.
When you configure both traffic splitting and fallbacks:
- For each incoming request, traffic splitting selects one destination from the configured set, based on weights. This selection becomes the primary destination for that request.
- The system sends the request to the primary destination.
- If the request fails (for example, due to a 429 or 5xx error), the system retries the request against the configured fallback destinations. It tries them in the exact order specified.
- The system attempts fallbacks sequentially until one succeeds or it exhausts all fallback options.
Fallbacks are independent of traffic splitting. After the system selects a primary destination, it does not re-apply traffic splitting during retries.

Observability
Routing decisions for traffic splits and fallbacks are logged to the routing_information field in the system.ai_gateway.usage system table. Query this table to verify that requests are being routed according to your configured percentages and fallback order.
SELECT
routing_information.primary_destination AS destination,
COUNT(*) AS request_count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS actual_pct
FROM system.ai_gateway.usage
WHERE
endpoint_name = 'my-gateway-endpoint'
AND timestamp >= CURRENT_TIMESTAMP - INTERVAL 1 DAY
GROUP BY routing_information.primary_destination
ORDER BY actual_pct DESC;
Limitations
- You can configure traffic splitting across a maximum of 5 destinations.
- You cannot configure traffic splitting on fallback destinations.