Model usage for Unity AI Gateway services

Beta

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.

This page describes how to monitor usage for Unity AI Gateway services using the usage tracking system table.

The usage tracking table automatically captures request and response details for a model service, logging essential metrics like token usage and latency. You can use the data in this table to monitor users, track costs, and gain insights into model service performance and consumption.

Usage tracking also captures ai_query requests to Databricks-provided model services.

Requirements

The Unity AI Gateway account-level preview must be enabled for your account. An account admin enables this preview on the account console Previews page before you can use usage tracking or the built-in usage dashboard. See Manage Databricks previews.
A Databricks workspace in a Unity AI Gateway supported region.
Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.

Query the usage table

Unity AI Gateway logs usage data to the system.ai_gateway.usage system table. You can view the table in the UI, or query the table from Databricks SQL or a notebook.

note

Only account admins have permission to view or query the system.ai_gateway.usage table.

To view the table in the UI, click the usage tracking table link on the model service page to open the table in Catalog Explorer.

To query the table from Databricks SQL or a notebook:

SQL
SELECT * FROM system.ai_gateway.usage;

prompt

Genie Code (Agent mode) can do this for you. Try this example prompt:

Query the system.ai_gateway.usage table to analyze AI Gateway usage showing request count and total tokens, grouped by endpoint name for the last 7 days.

Built-in usage dashboard

note

Some workspaces do not yet show the Govern dropdown. In those workspaces, use the standalone Create Dashboard, View Dashboard, and Update buttons on the Unity AI Gateway page instead.

Create built-in usage dashboard

Account admins can create a built-in Unity AI Gateway usage dashboard to monitor usage, track costs, and gain insights into model service performance and consumption. From the Unity AI Gateway page, click Govern in the top right, then click Create Usage Dashboard. The warehouse that runs the dashboard queries is selected automatically.

note

Dashboard creation is restricted to account admins because it requires SELECT permissions on the system.ai_gateway.usage table. The dashboard's data is subject to the usage table's retention policies. See Which system tables are available?.

When a newer version of the built-in usage dashboard is available, account admins can click Update on the dashboard version row in the Govern dropdown on the Unity AI Gateway page.

You can use the following dashboard configuration options to manage the dashboard:

Scope: Select whether to scope the dashboard to the account or workspace.
Permissions: Choose whether queries run using the dashboard owner’s permissions or each viewer’s permissions. See What are shared data permissions?.
Automatic updates: When you enable this option, the dashboard updates automatically whenever a newer version becomes available and an account administrator visits the Unity AI Gateway page.

ai-gateway update dashboard options

When the dashboard is updated to version 0.3 or higher, a schedule is automatically created to refresh the dashboard every 6 hours. If needed, this schedule can be disabled in the Lakeview dashboard. See Create a schedule.

View usage dashboard

To view the dashboard, click Govern in the top right of the Unity AI Gateway page, then click Usage Dashboard. The dashboard opens in a new tab. The built-in dashboard has comprehensive visibility into Unity AI Gateway model service usage, performance, and cost. It includes multiple pages tracking requests, token consumption, latency metrics, error rates, cost breakdowns, external MCP server traffic, and coding agent activity.

ai-gateway usage dashboard

The dashboard provides cross-workspace analytics by default. All dashboard pages can be filtered by date range and workspace ID.

Overview tab: Shows high-level usage metrics including daily request volume, token usage trends over time, top users by token consumption, and total unique user counts. Use this tab to get a quick snapshot of overall Unity AI Gateway activity and identify the most active users and models.
Performance tab: Tracks key performance metrics including latency percentiles (P50, P90, P95, P99), time to first byte, error rates, and HTTP status code distributions. Use this tab to monitor model service health and identify performance bottlenecks or reliability issues.
Usage tab: Shows detailed consumption breakdowns by model service, workspace, and requester. This tab shows token usage patterns, request distributions, and cache hit ratios.
Cost Observability tab: Shows cost breakdowns by model service, target model, user, service tags, and request tags. This tab also includes estimated cost for external models. See Monitor Unity AI Gateway cost.
External MCP Server tab: Shows request volume, error rates, users and connections, and daily usage trends for external MCP server traffic.
Coding Agents tab: Tracks activity from integrated coding agents including Cursor, Claude Code, Gemini CLI, and Codex CLI. This tab shows metrics like active days, coding sessions, commits, and lines of code added or removed to monitor developer tool usage. See Coding agent dashboard for more details.

Usage table schema

The system.ai_gateway.usage table has the following schema:

Column name	Type	Description	Example
`account_id`	STRING	The account ID.	`11d77e21-5e05-4196-af72-423257f74974`
`workspace_id`	STRING	The workspace ID.	`1653573648247579`
`request_id`	STRING	A unique identifier for the request.	`b4a47a30-0e18-4ae3-9a7f-29bcb07e0f00`
`invocation_id`	STRING	A unique identifier for each individual inference call. Multiple invocations can share the same `request_id`, such as guardrail checks or multi-turn agent calls. Use `invocation_id` to distinguish them.	`c0a8012e-9f3b-4d21-8a7e-1b2c3d4e5f60`
`schema_version`	INTEGER	The schema version of the usage record.	`1`
`endpoint_id`	STRING	The unique ID of the Unity AI Gateway model service.	`43addf89-d802-3ca2-bd54-fe4d2a60d58a`
`endpoint_name`	STRING	The name of the Unity AI Gateway model service.	`databricks-gpt-5-2`
`endpoint_tags`	MAP	Tags configured on the model service at creation or update time. They apply to all requests to the model service and are useful for categorizing services by team, cost center, or project.	`{"team": "engineering"}`
`endpoint_metadata`	STRUCT	Model service metadata including `creator`, `creation_time`, `last_updated_time`, `destinations`, `inference_table`, and `fallbacks`.	`{"creator": "user.name@email.com", "creation_time": "2026-01-06T12:00:00.000Z", ...}`
`event_time`	TIMESTAMP	The timestamp when the request was received.	`2026-01-20T19:48:08.000+00:00`
`latency_ms`	LONG	The total latency in milliseconds.	`300`
`time_to_first_byte_ms`	LONG	The time to first byte in milliseconds.	`300`
`destination_type`	STRING	The type of destination (for example, external model or foundation model).	`PAY_PER_TOKEN_FOUNDATION_MODEL`
`destination_name`	STRING	The name of the destination model or provider.	`databricks-gpt-5-2`
`destination_id`	STRING	The unique ID of the destination.	`507e7456151b3cc89e05ff48161efb87`
`destination_model`	STRING	The specific model used for the request.	`GPT-5.2`
`requester`	STRING	The ID of the user or service principal that made the request.	`user.name@email.com`
`requester_type`	STRING	The type of requester (user, service principal, or user group).	`USER`
`ip_address`	STRING	The IP address of the requester.	`1.2.3.4`
`url`	STRING	The URL of the request.	`https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions`
`user_agent`	STRING	The user agent of the requester.	`OpenAI/Python 2.13.0`
`api_type`	STRING	The type of API call (for example, chat, completions, or embeddings).	`mlflow/v1/chat/completions`
`request_tags`	MAP	User-provided tags sent with individual requests using the `Databricks-Ai-Gateway-Request-Tags` HTTP header. Use request tags to attribute usage to specific projects, teams, environments, or end users. See Tag requests for usage tracking and Tag requests for usage tracking.	`{"project": "chatbot", "team": "ml-platform"}`
`invocation_metadata`	STRUCT	System-generated metadata about the inference call. Contains `source`, the service or path that initiated the call.	`{"source": "EXTERNAL_CLIENT"}`
`input_tokens`	LONG	The number of input tokens.	`100`
`output_tokens`	LONG	The number of output tokens.	`100`
`total_tokens`	LONG	The total number of tokens (input + output).	`200`
`token_details`	STRUCT	Detailed token breakdown including `cache_read_input_tokens`, `cache_creation_input_tokens`, and `output_reasoning_tokens`.	`{"cache_read_input_tokens": 100, ...}`
`response_content_type`	STRING	The content type of the response.	`application/json`
`status_code`	INT	The HTTP status code of the response.	`200`
`routing_information`	STRUCT	Routing details for fallback attempts. Contains an `attempts` array with `priority`, `action`, `destination`, `destination_id`, `status_code`, `error_code`, `latency_ms`, `start_time`, and `end_time` for each model tried during the request.	`{"attempts": [{"priority": "1", ...}]}`

Column name	Type	Description	Example
`account_id`	STRING	The account ID.	`11d77e21-5e05-4196-af72-423257f74974`
`workspace_id`	STRING	The workspace ID.	`1653573648247579`
`request_id`	STRING	A unique identifier for the request.	`b4a47a30-0e18-4ae3-9a7f-29bcb07e0f00`
`invocation_id`	STRING	A unique identifier for each individual inference call. Multiple invocations can share the same `request_id`, such as guardrail checks or multi-turn agent calls. Use `invocation_id` to distinguish them.	`c0a8012e-9f3b-4d21-8a7e-1b2c3d4e5f60`
`schema_version`	INTEGER	The schema version of the usage record.	`1`
`endpoint_id`	STRING	The unique ID of the Unity AI Gateway model service.	`43addf89-d802-3ca2-bd54-fe4d2a60d58a`
`endpoint_name`	STRING	The name of the Unity AI Gateway model service.	`databricks-gpt-5-2`
`endpoint_tags`	MAP	Tags configured on the model service at creation or update time. They apply to all requests to the model service and are useful for categorizing services by team, cost center, or project.	`{"team": "engineering"}`
`endpoint_metadata`	STRUCT	Model service metadata including `creator`, `creation_time`, `last_updated_time`, `destinations`, `inference_table`, and `fallbacks`.	`{"creator": "user.name@email.com", "creation_time": "2026-01-06T12:00:00.000Z", ...}`
`event_time`	TIMESTAMP	The timestamp when the request was received.	`2026-01-20T19:48:08.000+00:00`
`latency_ms`	LONG	The total latency in milliseconds.	`300`
`time_to_first_byte_ms`	LONG	The time to first byte in milliseconds.	`300`
`destination_type`	STRING	The type of destination (for example, external model or foundation model).	`PAY_PER_TOKEN_FOUNDATION_MODEL`
`destination_name`	STRING	The name of the destination model or provider.	`databricks-gpt-5-2`
`destination_id`	STRING	The unique ID of the destination.	`507e7456151b3cc89e05ff48161efb87`
`destination_model`	STRING	The specific model used for the request.	`GPT-5.2`
`requester`	STRING	The ID of the user or service principal that made the request.	`user.name@email.com`
`requester_type`	STRING	The type of requester (user, service principal, or user group).	`USER`
`ip_address`	STRING	The IP address of the requester.	`1.2.3.4`
`url`	STRING	The URL of the request.	`https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions`
`user_agent`	STRING	The user agent of the requester.	`OpenAI/Python 2.13.0`
`api_type`	STRING	The type of API call (for example, chat, completions, or embeddings).	`mlflow/v1/chat/completions`
`request_tags`	MAP	User-provided tags sent with individual requests using the `Databricks-Ai-Gateway-Request-Tags` HTTP header. Use request tags to attribute usage to specific projects, teams, environments, or end users. See Tag requests for usage tracking and Tag requests for usage tracking.	`{"project": "chatbot", "team": "ml-platform"}`
`invocation_metadata`	STRUCT	System-generated metadata about the inference call. Contains `source`, the service or path that initiated the call.	`{"source": "EXTERNAL_CLIENT"}`
`input_tokens`	LONG	The number of input tokens.	`100`
`output_tokens`	LONG	The number of output tokens.	`100`
`total_tokens`	LONG	The total number of tokens (input + output).	`200`
`token_details`	STRUCT	Detailed token breakdown including `cache_read_input_tokens`, `cache_creation_input_tokens`, and `output_reasoning_tokens`.	`{"cache_read_input_tokens": 100, ...}`
`response_content_type`	STRING	The content type of the response.	`application/json`
`status_code`	INT	The HTTP status code of the response.	`200`
`routing_information`	STRUCT	Routing details for fallback attempts. Contains an `attempts` array with `priority`, `action`, `destination`, `destination_id`, `status_code`, `error_code`, `latency_ms`, `start_time`, and `end_time` for each model tried during the request.	`{"attempts": [{"priority": "1", ...}]}`

Tag requests for usage tracking

Request tags are custom key-value pairs that the caller attaches to individual requests. Use request tags to attribute usage by project, team, environment, end user, or any other dimension relevant to your organization. Request tags are logged to the system.ai_gateway.usage table and can be used to filter, aggregate, and analyze usage data.

To tag individual requests, include the Databricks-Ai-Gateway-Request-Tags HTTP header with a JSON object mapping string keys to string values. Request tags are logged to the request_tags column in the usage table and in inference tables.

For examples showing how to set request tags with REST API, OpenAI SDK, and Anthropic SDK, see Tag requests for usage tracking.

For example, you can aggregate usage by project using request tags:

SQL
SELECT
  request_tags['project'] AS project,
  COUNT(*) AS request_count,
  SUM(total_tokens) AS total_tokens
FROM system.ai_gateway.usage
WHERE request_tags['project'] IS NOT NULL
GROUP BY request_tags['project']
ORDER BY total_tokens DESC;

Requirements​

Query the usage table​

Built-in usage dashboard​

Create built-in usage dashboard​

View usage dashboard​

Usage table schema​

Tag requests for usage tracking​

Additional resources​

Requirements

Query the usage table

Built-in usage dashboard

Create built-in usage dashboard

View usage dashboard

Usage table schema

Tag requests for usage tracking

Additional resources