Monitor models using inference tables

Beta

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.

This page describes how to use inference tables to monitor AI Gateway (Beta) endpoints.

What are AI Gateway inference tables?

AI Gateway inference tables log requests and responses from your AI Gateway endpoints to Unity Catalog Delta tables. You can use this data for monitoring, debugging, and optimizing your models.

Common use cases include:

Debugging: Analyze request and response payloads to troubleshoot issues.
Monitoring: Track model performance and identify anomalies.
Optimization: Review interactions to improve model prompts and configurations.
Compliance: Maintain audit logs of all model interactions.

Requirements

AI Gateway (Beta) preview enabled for your account. See Manage Databricks previews.
A Databricks workspace in a AI Gateway (Beta) supported region.
Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.
Both the creator of the endpoint and the modifier must have Can Manage permission on the endpoint.
- CREATE TABLE permission in the specified Unity Catalog catalog and schema.
- USE CATALOG permission on the specified catalog.
- USE SCHEMA permission on the specified schema.
The catalog cannot be a Delta Sharing catalog to the current metastore.
Databricks recommends enabling predictive optimization for improved performance.

Enable inference tables

Inference tables can only be configured after you create an AI Gateway endpoint.

To enable inference tables:

In the sidebar, click AI Gateway.
Click the endpoint name to open the endpoint page.
Click Set Up next to Inference tables.
Specify the catalog and schema where you want to store the inference table.
Click Save.

The owner of the inference table is the user who created the endpoint. All ACLs follow standard Unity Catalog permissions and can be modified by the table owner.

note

Specifying an existing table is not supported. Databricks automatically creates a new inference table when you enable inference tables.

warning

The inference table could stop logging data or become corrupted if you do any of the following:

Change the table schema.
Change the table name.
Delete the table.

Disable inference tables

To disable inference tables:

In the sidebar, click AI Gateway.
Click the endpoint name to open the endpoint page.
Click the edit icon next to Inference tables.
Click Disable inference tables.

Query the inference table

You can view the table in the UI, or query the table from Databricks SQL or a notebook.

To view the table in the UI, click the inference table link on the endpoint page to open the table in Catalog Explorer.

To query the table from Databricks SQL or a notebook:

SQL
SELECT * FROM <catalog>.<schema>.<payload_table>

Replace <catalog>, <schema>, and <payload_table> with your table location.

Inference table schema

AI Gateway inference tables have the following schema:

Column name	Type	Description	Example
`request_id`	STRING	A unique identifier for the request.	`7a99b43cb46c432bb0a7814217701909`
`request_tags`	MAP	Tags associated with the request.	`{"team": "engineering"}`
`event_time`	TIMESTAMP	The timestamp when the request was received.	`2024-05-17T13:47:13.282-07:00`
`status_code`	INT	The HTTP status code of the response.	`200`
`sampling_fraction`	DOUBLE	The sampling fraction if down-sampling was used. A value of 1 means no down-sampling.	`1`
`latency_ms`	LONG	The total latency in milliseconds.	`300`
`time_to_first_byte_ms`	LONG	The time to first byte in milliseconds.	`200`
`request`	STRING	The raw JSON request payload.	`{"messages": [...], ...}`
`response`	STRING	The raw JSON response payload.	`{"choices": [...], ...}`
`destination_id`	STRING	The ID of the destination model or provider.	`7a99b43c-b46c-432b-b0a7-814217701909`
`logging_error_codes`	ARRAY	Error codes if logging failed (for example, `MAX_REQUEST_SIZE_EXCEEDED`).	`["MAX_RESPONSE_SIZE_EXCEEDED"]`
`requester`	STRING	The ID of the user or service principal that made the request.	`databricks.engineer@databricks.com`
`schema_version`	STRING	The schema version of the inference table record.	`0`

Limitations

Best effort delivery: Logs are typically available within minutes of a request, but delivery isn't guaranteed.
Maximum payload size: Requests and responses larger than 10 MiB aren't logged. The logging_error_codes column indicates when this occurs with MAX_REQUEST_SIZE_EXCEEDED or MAX_RESPONSE_SIZE_EXCEEDED.
Error responses: Logs may not be populated for requests that return 401, 403, 429, or 500 errors.

What are AI Gateway inference tables?​

Requirements​

Enable inference tables​

Disable inference tables​

Query the inference table​

Inference table schema​

Limitations​

Next steps​