Skip to main content

Scorers

Scorers evaluate GenAI app quality by analyzing outputs and producing structured feedback. The same scorer can be used for evaluation in development and reused for monitoring in production. Scorers include:

The MLflow UI screenshot below illustrates outputs from a built-in scorer Safety and a custom scorer exact_match:

Example metrics from scorers

The code snippet below computes these metrics using mlflow.genai.evaluate() and then registers the same scorers for production monitoring:

Python
import mlflow
from mlflow.genai.scorers import Safety, ScorerSamplingConfig, scorer
from typing import Any

@scorer
def exact_match(outputs: str, expectations: dict[str, Any]) -> bool:
# Example of a custom code-based scorer
return outputs == expectations["expected_response"]

# Evaluation during development
eval_results = mlflow.genai.evaluate(
data=eval_dataset,
predict_fn=my_app,
scorers=[Safety(), exact_match]
)

# Production monitoring - same scorers!
registered_scorers = [
Safety().register(),
exact_match.register(),
]
registered_scorers = [
reg_scorer.start(
sampling_config=ScorerSamplingConfig(sample_rate=0.1)
)
for reg_scorer in registered_scorers
]

Next steps