Guardrails AI scorers
Guardrails AI is a framework for validating LLM outputs using a community-driven hub of validators for safety, PII detection, content quality, and more. MLflow integrates with Guardrails AI so that you can use Guardrails validators as scorers, offering rule-based evaluation without requiring LLM calls.
Requirements
Install the guardrails-ai package:
%pip install guardrails-ai
Quick start
To call a Guardrails AI scorer directly:
from mlflow.genai.scorers.guardrails import ToxicLanguage
scorer = ToxicLanguage(threshold=0.7)
feedback = scorer(
outputs="This is a professional and helpful response.",
)
print(feedback.value) # "yes" or "no"
To call Guardrails AI scorers using mlflow.genai.evaluate():
import mlflow
from mlflow.genai.scorers.guardrails import ToxicLanguage, DetectPII
eval_dataset = [
{
"inputs": {"query": "What is MLflow?"},
"outputs": "MLflow is an open-source AI engineering platform for agents and LLMs.",
},
{
"inputs": {"query": "How do I contact support?"},
"outputs": "You can reach us at support@example.com or call 555-0123.",
},
]
results = mlflow.genai.evaluate(
data=eval_dataset,
scorers=[
ToxicLanguage(threshold=0.7),
DetectPII(),
],
)
Available Guardrails AI scorers
Safety and content quality
These scorers validate LLM outputs for safety, PII, and content quality concerns.
Scorer | What does it evaluate? | Guardrails Hub |
|---|---|---|
Does the output contain toxic or offensive language? | ||
Does the output contain NSFW or explicit content? | ||
Does the input contain a jailbreak or prompt injection attempt? | ||
Does the output contain personally identifiable information? | ||
Does the output contain API keys, tokens, or other secrets? | ||
Does the output contain nonsensical or incoherent text? |
Create a scorer by name
You can dynamically create a scorer using get_scorer by passing the validator name as a string:
from mlflow.genai.scorers.guardrails import get_scorer
scorer = get_scorer(
validator_name="ToxicLanguage",
threshold=0.7,
)
feedback = scorer(
outputs="This is a professional response.",
)
Configuration
Guardrails AI scorers accept validator-specific parameters as keyword arguments to the constructor.
from mlflow.genai.scorers.guardrails import ToxicLanguage, DetectPII, DetectJailbreak
# Toxicity detection with custom threshold
scorer = ToxicLanguage(
threshold=0.7,
validation_method="sentence",
)
# PII detection with custom entity types
pii_scorer = DetectPII(
pii_entities=["CREDIT_CARD", "SSN", "EMAIL_ADDRESS"],
)
# Jailbreak detection with custom sensitivity
jailbreak_scorer = DetectJailbreak(
threshold=0.9,
)
For validator-specific parameters and additional validators, see the Guardrails AI documentation and the Guardrails Hub.