Guardrails AI scorers

Guardrails AI is a framework for validating LLM outputs using a community-driven hub of validators for safety, PII detection, content quality, and more. MLflow integrates with Guardrails AI so that you can use Guardrails validators as scorers, offering rule-based evaluation without requiring LLM calls.

Requirements

Install the guardrails-ai package:

Python
%pip install guardrails-ai

Quick start

To call a Guardrails AI scorer directly:

Python
from mlflow.genai.scorers.guardrails import ToxicLanguage

scorer = ToxicLanguage(threshold=0.7)
feedback = scorer(
    outputs="This is a professional and helpful response.",
)

print(feedback.value)  # "yes" or "no"

To call Guardrails AI scorers using mlflow.genai.evaluate():

Python
import mlflow
from mlflow.genai.scorers.guardrails import ToxicLanguage, DetectPII

eval_dataset = [
    {
        "inputs": {"query": "What is MLflow?"},
        "outputs": "MLflow is an open-source AI engineering platform for agents and LLMs.",
    },
    {
        "inputs": {"query": "How do I contact support?"},
        "outputs": "You can reach us at support@example.com or call 555-0123.",
    },
]

results = mlflow.genai.evaluate(
    data=eval_dataset,
    scorers=[
        ToxicLanguage(threshold=0.7),
        DetectPII(),
    ],
)

Available Guardrails AI scorers

Safety and content quality

These scorers validate LLM outputs for safety, PII, and content quality concerns.

Scorer	What does it evaluate?	Guardrails Hub
`ToxicLanguage`	Does the output contain toxic or offensive language?	Link
`NSFWText`	Does the output contain NSFW or explicit content?	Link
`DetectJailbreak`	Does the input contain a jailbreak or prompt injection attempt?	Link
`DetectPII`	Does the output contain personally identifiable information?	Link
`SecretsPresent`	Does the output contain API keys, tokens, or other secrets?	Link
`GibberishText`	Does the output contain nonsensical or incoherent text?	Link

Create a scorer by name

You can dynamically create a scorer using get_scorer by passing the validator name as a string:

Python
from mlflow.genai.scorers.guardrails import get_scorer

scorer = get_scorer(
    validator_name="ToxicLanguage",
    threshold=0.7,
)
feedback = scorer(
    outputs="This is a professional response.",
)

Configuration

Guardrails AI scorers accept validator-specific parameters as keyword arguments to the constructor.

Python
from mlflow.genai.scorers.guardrails import ToxicLanguage, DetectPII, DetectJailbreak

# Toxicity detection with custom threshold
scorer = ToxicLanguage(
    threshold=0.7,
    validation_method="sentence",
)

# PII detection with custom entity types
pii_scorer = DetectPII(
    pii_entities=["CREDIT_CARD", "SSN", "EMAIL_ADDRESS"],
)

# Jailbreak detection with custom sensitivity
jailbreak_scorer = DetectJailbreak(
    threshold=0.9,
)

For validator-specific parameters and additional validators, see the Guardrails AI documentation and the Guardrails Hub.

Requirements​

Quick start​

Available Guardrails AI scorers​

Safety and content quality​

Create a scorer by name​

Configuration​