Monitor models using inference tables
This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.
This page describes how to use inference tables to monitor AI Gateway (Beta) endpoints.
What are AI Gateway inference tables?
AI Gateway inference tables log requests and responses from your AI Gateway endpoints to Unity Catalog Delta tables. You can use this data for monitoring, debugging, and optimizing your models.
Common use cases include:
- Debugging: Analyze request and response payloads to troubleshoot issues.
- Monitoring: Track model performance and identify anomalies.
- Optimization: Review interactions to improve model prompts and configurations.
- Compliance: Maintain audit logs of all model interactions.
Requirements
-
AI Gateway (Beta) preview enabled for your account. See Manage Databricks previews.
-
A Databricks workspace in a AI Gateway (Beta) supported region.
-
Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.
-
Both the creator of the endpoint and the modifier must have Can Manage permission on the endpoint.
CREATE TABLEpermission in the specified Unity Catalog catalog and schema.USE CATALOGpermission on the specified catalog.USE SCHEMApermission on the specified schema.
-
The catalog cannot be a Delta Sharing catalog to the current metastore.
-
Databricks recommends enabling predictive optimization for improved performance.
Enable inference tables
Inference tables can only be configured after you create an AI Gateway endpoint.
To enable inference tables:
- In the sidebar, click AI Gateway.
- Click the endpoint name to open the endpoint page.
- Click Set Up next to Inference tables.
- Specify the catalog and schema where you want to store the inference table.
- Click Save.
The owner of the inference table is the user who created the endpoint. All ACLs follow standard Unity Catalog permissions and can be modified by the table owner.
Specifying an existing table is not supported. Databricks automatically creates a new inference table when you enable inference tables.
The inference table could stop logging data or become corrupted if you do any of the following:
- Change the table schema.
- Change the table name.
- Delete the table.
Disable inference tables
To disable inference tables:
- In the sidebar, click AI Gateway.
- Click the endpoint name to open the endpoint page.
- Click the edit icon next to Inference tables.
- Click Disable inference tables.
Query the inference table
You can view the table in the UI, or query the table from Databricks SQL or a notebook.
To view the table in the UI, click the inference table link on the endpoint page to open the table in Catalog Explorer.
To query the table from Databricks SQL or a notebook:
SELECT * FROM <catalog>.<schema>.<payload_table>
Replace <catalog>, <schema>, and <payload_table> with your table location.
Inference table schema
AI Gateway inference tables have the following schema:
Column name | Type | Description | Example |
|---|---|---|---|
| STRING | A unique identifier for the request. |
|
| MAP | Tags associated with the request. |
|
| TIMESTAMP | The timestamp when the request was received. |
|
| INT | The HTTP status code of the response. |
|
| DOUBLE | The sampling fraction if down-sampling was used. A value of 1 means no down-sampling. |
|
| LONG | The total latency in milliseconds. |
|
| LONG | The time to first byte in milliseconds. |
|
| STRING | The raw JSON request payload. |
|
| STRING | The raw JSON response payload. |
|
| STRING | The ID of the destination model or provider. |
|
| ARRAY | Error codes if logging failed (for example, |
|
| STRING | The ID of the user or service principal that made the request. |
|
| STRING | The schema version of the inference table record. |
|
Limitations
- Best effort delivery: Logs are typically available within minutes of a request, but delivery isn't guaranteed.
- Maximum payload size: Requests and responses larger than 10 MiB aren't logged. The
logging_error_codescolumn indicates when this occurs withMAX_REQUEST_SIZE_EXCEEDEDorMAX_RESPONSE_SIZE_EXCEEDED. - Error responses: Logs may not be populated for requests that return 401, 403, 429, or 500 errors.