Monitor agent quality in production
This notebook runs Agent Evaluation on a sample of the requests served by an agent endpoint.
- To run the notebook once, fill in the required parameters up top and click Run all.
- To continuously monitor your production traffic, click Schedule to create a job to run the notebook periodically. For endpoints with a large number of requests, we recommend setting an hourly schedule.
The notebook creates a few artifacts:
- A table that records a sample of the requests received by an agent endpoint along with the metrics calculated by Agent Evaluation on those requests.
- A dashboard that visualizes the evaluation results.
- An MLFlow experiment to track runs of
mlflow.evaluate
The derived table has the name <inference_table>_request_logs_eval
, where <inference_table>
is the inference table associated with the agent endpoint. The dashboard is created automatically and is linked in the final cells of the notebook. You can use the table of contents at the left of the notebook to go directly to this cell.
Note: You should not need to edit this notebook, other than filling in the widgets at the top. This notebook requires either Serverless compute or a cluster with Databricks Runtime 15.2 or above.