Custom judges demo notebook

This notebook illustrates the following techniques for working with custom judges in Mosaic AI Agent Evaluation.

Run a subset of AI judges.
Create AI judges from guidelines.
Create AI judges from custom metrics and callable judges.

2

4

/local_disk0/.ephemeral_nfs/envs/pythonEnv-e8889c42-d0c0-41a0-a2ca-6e03ddea1f6c/lib/python3.10/site-packages/mlflow/pyfunc/utils/data_validation.py:134: UserWarning: Add type hints to the `predict` method to enable data validation and automatic signature inference during model logging. Check https://mlflow.org/docs/latest/model/python_model.html#type-hint-usage-in-pythonmodel for more details. color_warning(

6

Table

8

ERROR:databricks.rag_eval.evaluation.custom_metrics:Error when evaluating metric no_pii: name 'Assessment' is not defined. Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-e8889c42-d0c0-41a0-a2ca-6e03ddea1f6c/lib/python3.10/site-packages/databricks/rag_eval/evaluation/custom_metrics.py", line 189, in run metric_value = self.eval_fn(**kwargs) File "/home/spark-e8889c42-d0c0-41a0-a2ca-6e/.ipykernel/3816/command-8932311682584259-3274122340", line 39, in no_pii Assessment( NameError: name 'Assessment' is not defined

Table

10

🏃 View run colorful-colt-321 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/345767295064030/runs/5af4fc0fae7c47f9bf897f2ca1bcf3ef 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/345767295064030

Table

custom-judges(Python)

Custom judges demo notebook

Run a subset of AI judges

Create AI judges from guidelines

Convert `make_genai_metric_from_prompt` to a custom metric

Create AI judges from a prompt

custom-judges(Python)

Custom judges demo notebook

Run a subset of AI judges

Create AI judges from guidelines

Convert make_genai_metric_from_prompt to a custom metric

Create AI judges from a prompt

Convert `make_genai_metric_from_prompt` to a custom metric