Mosaic AI Agent Evaluation quickstart notebook
The following notebook demonstrates how to evaluate a gen AI app using Agent Evaluation's proprietary LLM judges, custom metrics, and labels from domain experts. It demonstrates the following:
- How to load production logs (traces) into an evaluation dataset.
- How to run an evaluation and do root cause analysis.
- How to create custom metrics to automatically detect quality issues.
- How to send production logs for SMEs to label and evolve the evaluation dataset.
To get your agent ready for pre-production, see the Mosaic AI agent demo notebook. For general information, see What is Mosaic AI Agent Evaluation?.