custom-judges

(Python)

Custom judges demo notebook

This notebook illustrates the following techniques for working with custom judges in Mosaic AI Agent Evaluation.

Run a subset of AI judges.
Create AI judges from guidelines.
Create AI judges from custom metrics and callable judges.

2

Run a subset of AI judges

4

Create AI judges from guidelines

For more information, see the documentation: (AWS | Azure).

6

Convert `make_genai_metric_from_prompt` to a custom metric

For more information, see the documentation: (AWS | Azure).

To give you more control, you can use the code below to convert the metric created with make_genai_metric_from_prompt to a custom metric in Agent Evaluation. This gives you the control to threshold, or post-process the result.

In this example, we'll return both the numeric value and the boolean thresholded value.

8

Create AI judges from a prompt

For more information, see the documentation: (AWS | Azure).

This method is not recommended unless you need to create per-chunk assessments from a prompt. You can use custom metrics, callable judges, and custom Python code to give you more control.

10

;