Monitor Unity AI Gateway cost

Beta

This feature is in Beta.

Observe and analyze cost for all Unity AI Gateway traffic by endpoint, target model, requesting principal, and tags.

note

Cost observability is based on Databricks billing records. For request-level usage analytics such as token counts, latency, requester details, and request tags, see Monitor usage for Unity AI Gateway endpoints.

Requirements

Unity AI Gateway enabled for your account.
A Databricks workspace in a Unity AI Gateway supported region.
The billable usage system table enabled for your account. See Enable system tables.

Attribution

Unity AI Gateway provides cost attribution through the billable usage system table (system.billing.usage).

Unity AI Gateway enriches MODEL_SERVING billing records in system.billing.usage with endpoint-specific metadata so that Databricks cost can be attributed to the associated endpoints, target models, principals, and endpoint tags. For the complete schema and field definitions, see the Billing usage system table reference.

The billable usage system table includes cost attribution for Databricks-hosted models. For external model cost analysis in the dashboard, see External model cost.

For requests served through a Unity AI Gateway endpoint, Databricks populates the following fields on MODEL_SERVING records in system.billing.usage:

Field	Description
`usage_metadata.ai_gateway_endpoint_name`	The name of the Unity AI Gateway endpoint that received the request.
`usage_metadata.ai_gateway_endpoint_id`	The ID of the Unity AI Gateway endpoint.
`usage_metadata.ai_gateway_destination_model`	The destination model that handled the request, for example `GPT-5.2`.
`usage_metadata.ai_gateway_destination_id`	The ID of the target that handled the request.
`identity_metadata.run_by`	The user or Databricks service principal that issued the request.
`custom_tags`	Endpoint tags configured on the Unity AI Gateway endpoint, such as `team` or `cost_center`. See Configure Unity AI Gateway endpoints.

These fields are populated for both real-time and batch inference requests routed through Unity AI Gateway endpoints.

Observability

The built-in usage dashboard includes a Cost Analysis page for monitoring cost and analyzing cost breakdowns over time. You can analyze cost across multiple dimensions, including:

Endpoint
Target model
Requesting user or service principal
Endpoint tags
Request tags

To open the dashboard, click View Dashboard from the AI Gateway page. For details on importing and updating the dashboard, see Built-in usage dashboard.

ai-gateway cost analysis dashboard

ai-gateway cost analysis drilldown

note

Cost observability is available in dashboard version 0.4 and above. Account admins must update the dashboard to receive the latest template changes. See Built-in usage dashboard.

Tag-based analysis

The Cost Analysis page includes tag-based views and filters so you can analyze cost using endpoint tags and request tags.

Endpoint tags are configured on the Unity AI Gateway endpoint and apply to all requests sent to that endpoint. Request tags are attached to individual requests and enable more granular attribution within the same endpoint, such as by project, feature, environment, or end user.

Tag filters accept a semicolon-separated list in the format <entry1>;<entry2>;<entry3>, where each entry is specified as either:

<key> to match all values for a tag key. For example, team matches all requests with the team tag.
<key>=<value> to match a specific tag key-value pair. For example, team=ml-platform;env=prod matches requests tagged with team=ml-platform and env=prod.

For information about configuring and querying request tags, see Tag requests and endpoints for usage tracking.

External model cost

The usage dashboard can be configured to include cost estimates for external models by specifying a model pricing table in the Pricing Table Override setting. The pricing table is user-managed and must be provided as input to the dashboard.

ai-gateway external model pricing table override

The pricing table must include the following fields:

Field	Type	Description
`model`	STRING	The model name used for cost attribution in the dashboard.
`input_token_price`	DOUBLE	The price for input tokens.
`output_token_price`	DOUBLE	The price for output tokens.
`cache_read_input_token_price`	DOUBLE	The price for cache-read input tokens, when supported.
`cache_write_input_token_price`	DOUBLE	The price for cache-write input tokens, when supported.

note

Cost estimates for external models are for informational purposes only. These figures are calculated based on list or override prices and might not reflect your final provider invoice. Databricks is not liable for discrepancies in third-party billing.

Analyzing cost

The following queries analyze cost for Databricks-hosted models in system.billing.usage. Cost can be broken down by endpoint, target model, principal, and endpoint tag.

By endpoint

SQL
SELECT
  usage_metadata.ai_gateway_endpoint_name AS endpoint_name,
  SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
  AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
  AND usage_unit = 'DBU'
  AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY endpoint_name
ORDER BY dbus DESC;

By destination model

SQL
SELECT
  usage_metadata.ai_gateway_destination_model AS destination_model,
  SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
  AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
  AND usage_unit = 'DBU'
  AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY destination_model
ORDER BY dbus DESC;

By user or Databricks service principal

SQL
SELECT
  identity_metadata.run_by AS run_by,
  SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
  AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
  AND identity_metadata.run_by IS NOT NULL
  AND usage_unit = 'DBU'
  AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY run_by
ORDER BY dbus DESC;

By endpoint tag

Endpoint tags propagate to the billing records in custom_tags, which makes it possible to allocate cost by dimensions such as team, environment, project, or cost center.

SQL
SELECT
  custom_tags['team'] AS team,
  SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
  AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
  AND custom_tags['team'] IS NOT NULL
  AND usage_unit = 'DBU'
  AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY team
ORDER BY dbus DESC;

To add tags such as team, project, or cost_center to an Unity AI Gateway endpoint, see Configure Unity AI Gateway endpoints.

Limitations

Spend attribution applies to MODEL_SERVING records in system.billing.usage. Requests routed to external models that are billed directly by the external provider do not appear in system.billing.usage.
For Unity AI Gateway endpoints with multiple destinations, such as traffic splitting or fallbacks, ai_gateway_destination_model and ai_gateway_destination_id identify the destination that ultimately served the request.

Requirements​

Attribution​

Observability​

Tag-based analysis​

External model cost​

Analyzing cost​

By endpoint​

By destination model​

By user or Databricks service principal​

By endpoint tag​

Limitations​

Requirements

Attribution

Observability

Tag-based analysis

External model cost

Analyzing cost

By endpoint

By destination model

By user or Databricks service principal

By endpoint tag

Limitations