TruLensのスコアラー

TruLensは、LLMアプリケーション向けの評価および可観測性フレームワークであり、RAGシステムおよびエージェント追跡分析のためのフィードバック機能を提供する。MLflowはTruLensと統合されているため、TruLensのフィードバック機能をスコアラーとして使用できます。これには、エージェントのトレースに対するベンチマークに基づいた目標・計画・行動の整合性評価も含まれます。

要件

trulensとtrulens-providers-litellmパッケージをインストールしてください。

Python
%pip install trulens trulens-providers-litellm

クイックスタート

TruLensのスコアラーに直接電話するには：

Python
from mlflow.genai.scorers.trulens import Groundedness

scorer = Groundedness(model="openai:/gpt-5-mini")
feedback = scorer(
    inputs="What is MLflow?",
    outputs="MLflow is an open-source AI engineering platform for agents and LLMs.",
    expectations={
        &quot;context&quot;: &quot;MLflow is an ML platform for experiment tracking and model deployment.&quot;
    },
)

print(feedback.value)  # "yes" or "no"
print(feedback.metadata["score"])  # 0.85

mlflow.genai.evaluate()を使用して TruLens スコアラーを呼び出すには:

Python
import mlflow
from mlflow.genai.scorers.trulens import Groundedness, AnswerRelevance

eval_dataset = [
    {
        "inputs": {"query": "What is MLflow?"},
        "outputs": "MLflow is an open-source AI engineering platform for agents and LLMs.",
        "expectations": {
            "context": "MLflow is an ML platform for experiment tracking and model deployment."
        },
    },
    {
        "inputs": {"query": "How do I track experiments?"},
        "outputs": "You can use mlflow.start_run() to begin tracking experiments.",
        "expectations": {
            "context": "MLflow provides APIs like mlflow.start_run() for experiment tracking."
        },
    },
]

results = mlflow.genai.evaluate(
    data=eval_dataset,
    scorers=[
        Groundedness(model="openai:/gpt-5-mini"),
        AnswerRelevance(model="openai:/gpt-5-mini"),
    ],
)

利用可能なTruLensスコアラー

RAGメトリクス

これらのスコアラーは、検索拡張型生成（RAG）アプリケーションにおける検索品質と回答生成を評価します。

スコアラー	それは何を評価するのですか？	TruLensドキュメント
`Groundedness`	その回答は、提示された文脈に基づいていますか？	リンク
`ContextRelevance`	取得したコンテキストは、入力クエリに関連していますか？	リンク
`AnswerRelevance`	出力は入力クエリに関連していますか？	リンク
`Coherence`	出力は首尾一貫していて、論理的に矛盾していませんか？	リンク

エージェントトレースメトリクス

これらのスコアラーは、目標・計画・行動の整合性を用いて、AIエージェントの実行トレースを評価します。

スコアラー	それは何を評価するのですか？	TruLensドキュメント
`LogicalConsistency`	エージェントの推論は、実行全体を通して論理的に一貫しているか？	リンク
`ExecutionEfficiency`	エージェントは不必要なステップを省いて最適なパスを選択しますか?	リンク
`PlanAdherence`	エージェントは実行時に、明示された計画に従って行動するか？	リンク
`PlanQuality`	エージェントの計画は、適切に構成されており、目標に適しているか？	リンク
`ToolSelection`	エージェントは各ステップに適したツールを選択しているか？	リンク
`ToolCalling`	エージェントは正しい問題を指定してツールを呼び出しますか?	リンク

エージェントトレーススコアラーはtrace引数を必要とし、完全な実行トレースを評価します。

Python
import mlflow
from mlflow.genai.scorers.trulens import LogicalConsistency, ToolSelection

traces = mlflow.search_traces(experiment_ids=["1"])
results = mlflow.genai.evaluate(
    data=traces,
    scorers=[
        LogicalConsistency(model="openai:/gpt-5-mini"),
        ToolSelection(model="openai:/gpt-5-mini"),
    ],
)

名前でスコアラーを作成する

メトリクス名を文字列として渡すことで、 get_scorerを使用してスコアラーを動的に作成できます。

Python
from mlflow.genai.scorers.trulens import get_scorer

scorer = get_scorer(
    metric_name="Groundedness",
    model="openai:/gpt-5-mini",
)
feedback = scorer(
    inputs="What is MLflow?",
    outputs="MLflow is a platform for ML workflows.",
    expectations={&quot;context&quot;: &quot;MLflow is an ML platform.&quot;},
)

構成

TruLens スコアラーは、評価動作を制御する共通の問題を受け入れます。すべての得点者にはmodel問題が必要です。

Python
from mlflow.genai.scorers.trulens import Groundedness, ContextRelevance

# Common parameters
scorer = Groundedness(
    model="openai:/gpt-5-mini",
    threshold=0.7,
)

# Default threshold is 0.5
scorer = ContextRelevance(model="openai:/gpt-5-mini")

メトリクス固有のオプションと高度な使用オプションについては、 TruLens のドキュメントを参照してください。

要件​

クイックスタート​

利用可能なTruLensスコアラー​

RAGメトリクス​

エージェントトレースメトリクス​

名前でスコアラーを作成する​

構成​

要件

クイックスタート

利用可能なTruLensスコアラー

RAGメトリクス

エージェントトレースメトリクス

名前でスコアラーを作成する

構成