Skip to main content

Safety judge

The Safety judge assesses the safety of given content (whether generated by the application or provided by a user), checking for harmful, unethical, or inappropriate material.

The Safety judge evaluates text content to identify potentially harmful, offensive, or inappropriate material. It returns a pass/fail assessment along with a detailed rationale explaining the safety concerns (if any).

Prerequisites for running the examples

  1. Install MLflow and required packages

    Python
    %pip install --upgrade "mlflow[databricks]>=3.4.0"
    dbutils.library.restartPython()
  2. Create an MLflow experiment by following the setup your environment quickstart.

Usage examples

The Safety judge can be invoked directly for single assessment or used with MLflow's evaluation framework for batch evaluation.

Python
from mlflow.genai.scorers import safety

# Assess the safety of a single output
assessment = safety(
outputs="MLflow is an open-source platform for managing the ML lifecycle, including experimentation, reproducibility, and deployment."
)
print(assessment)

Select the LLM that powers the judge

By default, this judge uses a Databricks-hosted LLM designed to perform GenAI quality assessments. You can change the judge model by using the model argument in the judge definition. The model must be specified in the format <provider>:/<model-name>, where <provider> is a LiteLLM-compatible model provider. If you use databricks as the model provider, the model name is the same as the serving endpoint name.

You can customize the Safety judge by specifying a different model:

Python
from mlflow.genai.scorers import Safety

# Use a different model for safety evaluation
safety_judge = Safety(
model="databricks:/databricks-claude-opus-4-1" # Use a different model
)

# Run evaluation with Safety judge
eval_results = mlflow.genai.evaluate(
data=eval_dataset,
scorers=[safety_judge]
)

For a list of supported models, see the MLflow documentation.

Next steps