Skip to main content

Monitor Unity AI Gateway cost

Beta

This feature is in Beta.

Observe and analyze cost for all Unity AI Gateway traffic by endpoint, target model, requesting principal, and tags.

note

Cost observability is based on Databricks billing records. For request-level usage analytics such as token counts, latency, requester details, and request tags, see Monitor usage for Unity AI Gateway endpoints.

Requirements

Attribution

Unity AI Gateway provides cost attribution through the billable usage system table (system.billing.usage).

Unity AI Gateway enriches MODEL_SERVING billing records in system.billing.usage with endpoint-specific metadata so that Databricks cost can be attributed to the associated endpoints, target models, principals, and endpoint tags. For the complete schema and field definitions, see the Billing usage system table reference.

The billable usage system table includes cost attribution for Databricks-hosted models. For external model cost analysis in the dashboard, see External model cost.

For requests served through a Unity AI Gateway endpoint, Databricks populates the following fields on MODEL_SERVING records in system.billing.usage:

Field

Description

usage_metadata.ai_gateway_endpoint_name

The name of the Unity AI Gateway endpoint that received the request.

usage_metadata.ai_gateway_endpoint_id

The ID of the Unity AI Gateway endpoint.

usage_metadata.ai_gateway_destination_model

The destination model that handled the request, for example GPT-5.2.

usage_metadata.ai_gateway_destination_id

The ID of the target that handled the request.

identity_metadata.run_by

The user or Databricks service principal that issued the request.

custom_tags

Endpoint tags configured on the Unity AI Gateway endpoint, such as team or cost_center. See Configure Unity AI Gateway endpoints.

These fields are populated for both real-time and batch inference requests routed through Unity AI Gateway endpoints.

Observability

The built-in usage dashboard includes a Cost Analysis page for monitoring cost and analyzing cost breakdowns over time. You can analyze cost across multiple dimensions, including:

  • Endpoint
  • Target model
  • Requesting user or service principal
  • Endpoint tags
  • Request tags

To open the dashboard, click View Dashboard from the AI Gateway page. For details on importing and updating the dashboard, see Built-in usage dashboard.

ai-gateway cost analysis dashboard

ai-gateway cost analysis drilldown

note

Cost observability is available in dashboard version 0.4 and above. Account admins must update the dashboard to receive the latest template changes. See Built-in usage dashboard.

Tag-based analysis

The Cost Analysis page includes tag-based views and filters so you can analyze cost using endpoint tags and request tags.

Endpoint tags are configured on the Unity AI Gateway endpoint and apply to all requests sent to that endpoint. Request tags are attached to individual requests and enable more granular attribution within the same endpoint, such as by project, feature, environment, or end user.

Tag filters accept a semicolon-separated list in the format <entry1>;<entry2>;<entry3>, where each entry is specified as either:

  • <key> to match all values for a tag key. For example, team matches all requests with the team tag.
  • <key>=<value> to match a specific tag key-value pair. For example, team=ml-platform;env=prod matches requests tagged with team=ml-platform and env=prod.

For information about configuring and querying request tags, see Tag requests and endpoints for usage tracking.

External model cost

The usage dashboard can be configured to include cost estimates for external models by specifying a model pricing table in the Pricing Table Override setting. The pricing table is user-managed and must be provided as input to the dashboard.

ai-gateway external model pricing table override

The pricing table must include the following fields:

Field

Type

Description

model

STRING

The model name used for cost attribution in the dashboard.

input_token_price

DOUBLE

The price for input tokens.

output_token_price

DOUBLE

The price for output tokens.

cache_read_input_token_price

DOUBLE

The price for cache-read input tokens, when supported.

cache_write_input_token_price

DOUBLE

The price for cache-write input tokens, when supported.

note

Cost estimates for external models are for informational purposes only. These figures are calculated based on list or override prices and might not reflect your final provider invoice. Databricks is not liable for discrepancies in third-party billing.

Analyzing cost

The following queries analyze cost for Databricks-hosted models in system.billing.usage. Cost can be broken down by endpoint, target model, principal, and endpoint tag.

By endpoint

SQL
SELECT
usage_metadata.ai_gateway_endpoint_name AS endpoint_name,
SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
AND usage_unit = 'DBU'
AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY endpoint_name
ORDER BY dbus DESC;

By destination model

SQL
SELECT
usage_metadata.ai_gateway_destination_model AS destination_model,
SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
AND usage_unit = 'DBU'
AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY destination_model
ORDER BY dbus DESC;

By user or Databricks service principal

SQL
SELECT
identity_metadata.run_by AS run_by,
SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
AND identity_metadata.run_by IS NOT NULL
AND usage_unit = 'DBU'
AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY run_by
ORDER BY dbus DESC;

By endpoint tag

Endpoint tags propagate to the billing records in custom_tags, which makes it possible to allocate cost by dimensions such as team, environment, project, or cost center.

SQL
SELECT
custom_tags['team'] AS team,
SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
AND custom_tags['team'] IS NOT NULL
AND usage_unit = 'DBU'
AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY team
ORDER BY dbus DESC;

To add tags such as team, project, or cost_center to an Unity AI Gateway endpoint, see Configure Unity AI Gateway endpoints.

Limitations

  • Spend attribution applies to MODEL_SERVING records in system.billing.usage. Requests routed to external models that are billed directly by the external provider do not appear in system.billing.usage.
  • For Unity AI Gateway endpoints with multiple destinations, such as traffic splitting or fallbacks, ai_gateway_destination_model and ai_gateway_destination_id identify the destination that ultimately served the request.