Skip to main content

Guardrails AI scorers

Guardrails AI is a framework for validating LLM outputs using a community-driven hub of validators for safety, PII detection, content quality, and more. MLflow integrates with Guardrails AI so that you can use Guardrails validators as scorers, offering rule-based evaluation without requiring LLM calls.

Requirements

Install the guardrails-ai package:

Python
%pip install guardrails-ai

Quick start

To call a Guardrails AI scorer directly:

Python
from mlflow.genai.scorers.guardrails import ToxicLanguage

scorer = ToxicLanguage(threshold=0.7)
feedback = scorer(
outputs="This is a professional and helpful response.",
)

print(feedback.value) # "yes" or "no"

To call Guardrails AI scorers using mlflow.genai.evaluate():

Python
import mlflow
from mlflow.genai.scorers.guardrails import ToxicLanguage, DetectPII

eval_dataset = [
{
"inputs": {"query": "What is MLflow?"},
"outputs": "MLflow is an open-source AI engineering platform for agents and LLMs.",
},
{
"inputs": {"query": "How do I contact support?"},
"outputs": "You can reach us at support@example.com or call 555-0123.",
},
]

results = mlflow.genai.evaluate(
data=eval_dataset,
scorers=[
ToxicLanguage(threshold=0.7),
DetectPII(),
],
)

Available Guardrails AI scorers

Safety and content quality

These scorers validate LLM outputs for safety, PII, and content quality concerns.

Scorer

What does it evaluate?

Guardrails Hub

ToxicLanguage

Does the output contain toxic or offensive language?

Link

NSFWText

Does the output contain NSFW or explicit content?

Link

DetectJailbreak

Does the input contain a jailbreak or prompt injection attempt?

Link

DetectPII

Does the output contain personally identifiable information?

Link

SecretsPresent

Does the output contain API keys, tokens, or other secrets?

Link

GibberishText

Does the output contain nonsensical or incoherent text?

Link

Create a scorer by name

You can dynamically create a scorer using get_scorer by passing the validator name as a string:

Python
from mlflow.genai.scorers.guardrails import get_scorer

scorer = get_scorer(
validator_name="ToxicLanguage",
threshold=0.7,
)
feedback = scorer(
outputs="This is a professional response.",
)

Configuration

Guardrails AI scorers accept validator-specific parameters as keyword arguments to the constructor.

Python
from mlflow.genai.scorers.guardrails import ToxicLanguage, DetectPII, DetectJailbreak

# Toxicity detection with custom threshold
scorer = ToxicLanguage(
threshold=0.7,
validation_method="sentence",
)

# PII detection with custom entity types
pii_scorer = DetectPII(
pii_entities=["CREDIT_CARD", "SSN", "EMAIL_ADDRESS"],
)

# Jailbreak detection with custom sensitivity
jailbreak_scorer = DetectJailbreak(
threshold=0.9,
)

For validator-specific parameters and additional validators, see the Guardrails AI documentation and the Guardrails Hub.