TruLens scorers

TruLens is an evaluation and observability framework for LLM applications that provides feedback functions for RAG systems and agent trace analysis. MLflow integrates with TruLens so that you can use TruLens feedback functions as scorers, including benchmarked goal-plan-action alignment evaluations for agent traces.

Requirements

Install the trulens and trulens-providers-litellm packages:

Python
%pip install trulens trulens-providers-litellm

Quick start

To call a TruLens scorer directly:

Python
from mlflow.genai.scorers.trulens import Groundedness

scorer = Groundedness(model="openai:/gpt-5-mini")
feedback = scorer(
    inputs="What is MLflow?",
    outputs="MLflow is an open-source AI engineering platform for agents and LLMs.",
    expectations={
        "context": "MLflow is an ML platform for experiment tracking and model deployment."
    },
)

print(feedback.value)  # "yes" or "no"
print(feedback.metadata["score"])  # 0.85

To call TruLens scorers using mlflow.genai.evaluate():

Python
import mlflow
from mlflow.genai.scorers.trulens import Groundedness, AnswerRelevance

eval_dataset = [
    {
        "inputs": {"query": "What is MLflow?"},
        "outputs": "MLflow is an open-source AI engineering platform for agents and LLMs.",
        "expectations": {
            "context": "MLflow is an ML platform for experiment tracking and model deployment."
        },
    },
    {
        "inputs": {"query": "How do I track experiments?"},
        "outputs": "You can use mlflow.start_run() to begin tracking experiments.",
        "expectations": {
            "context": "MLflow provides APIs like mlflow.start_run() for experiment tracking."
        },
    },
]

results = mlflow.genai.evaluate(
    data=eval_dataset,
    scorers=[
        Groundedness(model="openai:/gpt-5-mini"),
        AnswerRelevance(model="openai:/gpt-5-mini"),
    ],
)

Available TruLens scorers

RAG metrics

These scorers evaluate retrieval quality and answer generation in retrieval-augmented generation (RAG) applications.

Scorer	What does it evaluate?	TruLens Docs
`Groundedness`	Is the response grounded in the provided context?	Link
`ContextRelevance`	Is the retrieved context relevant to the input query?	Link
`AnswerRelevance`	Is the output relevant to the input query?	Link
`Coherence`	Is the output coherent and logically consistent?	Link

Scorer	What does it evaluate?	TruLens Docs
`Groundedness`	Is the response grounded in the provided context?	Link
`ContextRelevance`	Is the retrieved context relevant to the input query?	Link
`AnswerRelevance`	Is the output relevant to the input query?	Link
`Coherence`	Is the output coherent and logically consistent?	Link

Agent trace metrics

These scorers evaluate AI agent execution traces using goal-plan-action alignment.

Scorer	What does it evaluate?	TruLens Docs
`LogicalConsistency`	Is the agent's reasoning logically consistent throughout execution?	Link
`ExecutionEfficiency`	Does the agent take an optimal path without unnecessary steps?	Link
`PlanAdherence`	Does the agent follow its stated plan during execution?	Link
`PlanQuality`	Is the agent's plan well-structured and appropriate for the goal?	Link
`ToolSelection`	Does the agent choose the appropriate tools for each step?	Link
`ToolCalling`	Does the agent invoke tools with correct parameters?	Link

Scorer	What does it evaluate?	TruLens Docs
`LogicalConsistency`	Is the agent's reasoning logically consistent throughout execution?	Link
`ExecutionEfficiency`	Does the agent take an optimal path without unnecessary steps?	Link
`PlanAdherence`	Does the agent follow its stated plan during execution?	Link
`PlanQuality`	Is the agent's plan well-structured and appropriate for the goal?	Link
`ToolSelection`	Does the agent choose the appropriate tools for each step?	Link
`ToolCalling`	Does the agent invoke tools with correct parameters?	Link

Agent trace scorers require a trace argument and evaluate the full execution trace:

Python
import mlflow
from mlflow.genai.scorers.trulens import LogicalConsistency, ToolSelection

traces = mlflow.search_traces(experiment_ids=["1"])
results = mlflow.genai.evaluate(
    data=traces,
    scorers=[
        LogicalConsistency(model="openai:/gpt-5-mini"),
        ToolSelection(model="openai:/gpt-5-mini"),
    ],
)

Create a scorer by name

You can dynamically create a scorer using get_scorer by passing the metric name as a string:

Python
from mlflow.genai.scorers.trulens import get_scorer

scorer = get_scorer(
    metric_name="Groundedness",
    model="openai:/gpt-5-mini",
)
feedback = scorer(
    inputs="What is MLflow?",
    outputs="MLflow is a platform for ML workflows.",
    expectations={"context": "MLflow is an ML platform."},
)

Configuration

TruLens scorers accept common parameters that control evaluation behavior. All scorers require a model parameter.

Python
from mlflow.genai.scorers.trulens import Groundedness, ContextRelevance

# Common parameters
scorer = Groundedness(
    model="openai:/gpt-5-mini",
    threshold=0.7,
)

# Default threshold is 0.5
scorer = ContextRelevance(model="openai:/gpt-5-mini")

For metric-specific parameters and advanced usage options, see the TruLens documentation.

Requirements​

Quick start​

Available TruLens scorers​

RAG metrics​

Agent trace metrics​

Create a scorer by name​

Configuration​