Configure AI Gateway endpoints

Beta

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.

This page describes how to configure AI Gateway (Beta) endpoints.

Requirements

AI Gateway (Beta) preview enabled for your account. See Manage Databricks previews.
A Databricks workspace in a AI Gateway (Beta) supported region.
Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.

Create an AI Gateway endpoint

To create an AI Gateway endpoint:

In the sidebar, click AI Gateway.
Click Create AI Gateway Endpoint.
Configure your endpoint name and primary model.
Click Create.

Configure features on an endpoint

You can update AI Gateway endpoints to enable and disable features. Updates to AI Gateway configurations take up to 1 minute to take effect.

To update AI Gateway features on an existing endpoint:

Click on your endpoint from the AI Gateway page.
In the Gateway Endpoint Details sidebar, click the edit icon next to the feature you want to update.
Make your changes and click Save.

AI Gateway UI

The following table summarizes the available AI Gateway features and how to configure them:

Feature	How to configure	Details
Usage tracking	Enabled by default.	Logs usage data to the `system.ai_gateway.usage` system table. Account admins must enable the `ai_gateway` system table schema before using the system tables. See Grant access to system tables. Only account admins have permission to view or query the `system.ai_gateway.usage` table. The input and output token counts are estimated as `(text_length+1)/4` if the token count is not returned by the model.
Inference tables	Select Enable inference tables to log requests and responses.	Logs to Unity Catalog Delta tables. You must have `CREATE TABLE` permission in the specified catalog schema. Payloads larger than 10 MiB are not logged. The response payload aggregates the response of all of the returned chunks.
Rate limits	Select Rate limits to configure queries per minute (QPM) or tokens per minute (TPM).	Configure limits at the endpoint, user, or group level. Use the Endpoint field to set global limits. The endpoint rate limit is a global maximum. If exceeded, all requests are blocked. Use the User (Default) field to set per-user limits. Define custom rate limits for individual users, service principals, or groups.
Fallbacks	Select Add fallback model to configure fallback models.	Requests fall back to other models when the primary model returns `429` or `5XX` errors. Each fallback model is tried once in sequential order until the request succeeds. The first successful or last failed request attempt and response are logged in both usage tracking and inference tables. All fallback attempts are recorded in the `routing_information` field of the usage tracking table.

The following diagram shows a fallbacks example where three models are registered as destinations of an AI Gateway endpoint:

The request is originally routed to Model 1.
If the request returns a 200 response, the request was successful on Model 1 and the request and its response are logged to the usage tracking and inference tables.
If the request returns a 429 or 5XX error on Model 1, the request falls back to the next model on the endpoint, Model 2.
If the request returns a 429 or 5XX error on Model 2, the request falls back to the next model on the endpoint, Model 3.
If the request returns a 429 or 5XX error on Model 3, the request fails since all fallback models have been tried. The failed request and the response error are logged to the usage tracking and inference tables.

Fallbacks example

Requirements​

Create an AI Gateway endpoint​

Configure features on an endpoint​

Next steps​

Requirements

Create an AI Gateway endpoint

Configure features on an endpoint

Next steps