Skip to main content

Monitor models using inference tables

Beta

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.

This page describes how to use inference tables to monitor AI Gateway (Beta) endpoints.

What are AI Gateway inference tables?

AI Gateway inference tables log requests and responses from your AI Gateway endpoints to Unity Catalog Delta tables. You can use this data for monitoring, debugging, and optimizing your models.

Common use cases include:

  • Debugging: Analyze request and response payloads to troubleshoot issues.
  • Monitoring: Track model performance and identify anomalies.
  • Optimization: Review interactions to improve model prompts and configurations.
  • Compliance: Maintain audit logs of all model interactions.

Requirements

  • AI Gateway (Beta) preview enabled for your account. See Manage Databricks previews.

  • A Databricks workspace in a AI Gateway (Beta) supported region.

  • Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.

  • Both the creator of the endpoint and the modifier must have Can Manage permission on the endpoint.

    • CREATE TABLE permission in the specified Unity Catalog catalog and schema.
    • USE CATALOG permission on the specified catalog.
    • USE SCHEMA permission on the specified schema.
  • The catalog cannot be a Delta Sharing catalog to the current metastore.

  • Databricks recommends enabling predictive optimization for improved performance.

Enable inference tables

Inference tables can only be configured after you create an AI Gateway endpoint.

To enable inference tables:

  1. In the sidebar, click AI Gateway.
  2. Click the endpoint name to open the endpoint page.
  3. Click Set Up next to Inference tables.
  4. Specify the catalog and schema where you want to store the inference table.
  5. Click Save.

The owner of the inference table is the user who created the endpoint. All ACLs follow standard Unity Catalog permissions and can be modified by the table owner.

note

Specifying an existing table is not supported. Databricks automatically creates a new inference table when you enable inference tables.

warning

The inference table could stop logging data or become corrupted if you do any of the following:

  • Change the table schema.
  • Change the table name.
  • Delete the table.

Disable inference tables

To disable inference tables:

  1. In the sidebar, click AI Gateway.
  2. Click the endpoint name to open the endpoint page.
  3. Click the edit icon next to Inference tables.
  4. Click Disable inference tables.

Query the inference table

You can view the table in the UI, or query the table from Databricks SQL or a notebook.

To view the table in the UI, click the inference table link on the endpoint page to open the table in Catalog Explorer.

To query the table from Databricks SQL or a notebook:

SQL
SELECT * FROM <catalog>.<schema>.<payload_table>

Replace <catalog>, <schema>, and <payload_table> with your table location.

Inference table schema

AI Gateway inference tables have the following schema:

Column name

Type

Description

Example

request_id

STRING

A unique identifier for the request.

7a99b43cb46c432bb0a7814217701909

request_tags

MAP

Tags associated with the request.

{"team": "engineering"}

event_time

TIMESTAMP

The timestamp when the request was received.

2024-05-17T13:47:13.282-07:00

status_code

INT

The HTTP status code of the response.

200

sampling_fraction

DOUBLE

The sampling fraction if down-sampling was used. A value of 1 means no down-sampling.

1

latency_ms

LONG

The total latency in milliseconds.

300

time_to_first_byte_ms

LONG

The time to first byte in milliseconds.

200

request

STRING

The raw JSON request payload.

{"messages": [...], ...}

response

STRING

The raw JSON response payload.

{"choices": [...], ...}

destination_id

STRING

The ID of the destination model or provider.

7a99b43c-b46c-432b-b0a7-814217701909

logging_error_codes

ARRAY

Error codes if logging failed (for example, MAX_REQUEST_SIZE_EXCEEDED).

["MAX_RESPONSE_SIZE_EXCEEDED"]

requester

STRING

The ID of the user or service principal that made the request.

databricks.engineer@databricks.com

schema_version

STRING

The schema version of the inference table record.

0

Limitations

  • Best effort delivery: Logs are typically available within minutes of a request, but delivery isn't guaranteed.
  • Maximum payload size: Requests and responses larger than 10 MiB aren't logged. The logging_error_codes column indicates when this occurs with MAX_REQUEST_SIZE_EXCEEDED or MAX_RESPONSE_SIZE_EXCEEDED.
  • Error responses: Logs may not be populated for requests that return 401, 403, 429, or 500 errors.

Next steps