ラベリングのスキーマ

ラベリングスキーマは、レビューアプリで既存のトレースをラベリングする際に、ドメインの専門家が回答する特定の質問を定義します。フィードバック収集プロセスを構築し、生成AI アプリを評価するための一貫性のある関連情報を確保します。

注記

ラベリングスキーマは、レビューアプリを使用して既存のトレースにラベルを付ける場合のみ適用され、レビューアプリを使用してチャットUIで新しいアプリバージョンをテストする場合には適用されません。

ラベリングスキーマのしくみ

ラベリング・セッションを作成するときは、それを 1 つ以上のラベリング・スキーマに関連付けます。各スキーマは、MLflow FeedbackExpectationAssessment トレースにアタッチされるまたはを表します。

スキーマは以下を制御します。

レビュー担当者に表示される質問
入力方法(ドロップダウン、テキストボックスなど)
検証ルールと制約
オプションの指示とコメント

important

ラベリングスキーマ名は、各 MLflow エクスペリメント内で一意である必要があります。同じエクスペリメントに同じ名前の 2 つのスキーマを含めることはできませんが、異なるエクスペリメント間でスキーマ名を再利用できます。

一般的なユースケースのラベリングのスキーマ

MLflow には、期待値を使用する定義済みのスコアラーに対して、定義済みのスキーマ名が用意されています。これらの名前を使用してカスタムスキーマを作成し、組み込みの評価機能との互換性を確保できます。

ガイドラインスコアラーと連携
- GUIDELINES :生成AIアプリがリクエストに対して従うべき理想的な指示を収集します
正確性スコアラーと連携
- EXPECTED_FACTS :正確性のために含める必要のある事実の記述を収集します
- EXPECTED_RESPONSE :完全なグラウンドトゥルース回答を収集します

一般的なユースケースのスキーマの作成

Python
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import LabelSchemaType, InputTextList, InputText

# Schema for collecting expected facts
expected_facts_schema = schemas.create_label_schema(
    name=schemas.EXPECTED_FACTS,
    type=LabelSchemaType.EXPECTATION,
    title="Expected facts",
    input=InputTextList(max_length_each=1000),
    instruction="Please provide a list of facts that you expect to see in a correct response.",
    overwrite=True
)

# Schema for collecting guidelines
guidelines_schema = schemas.create_label_schema(
    name=schemas.GUIDELINES,
    type=LabelSchemaType.EXPECTATION,
    title="Guidelines",
    input=InputTextList(max_length_each=500),
    instruction="Please provide guidelines that the model's output is expected to adhere to.",
    overwrite=True
)

# Schema for collecting expected response
expected_response_schema = schemas.create_label_schema(
    name=schemas.EXPECTED_RESPONSE,
    type=LabelSchemaType.EXPECTATION,
    title="Expected response",
    input=InputText(),
    instruction="Please provide a correct agent response.",
    overwrite=True
)

カスタムラベリングスキーマの作成

カスタムスキーマを作成して、ドメインに関する特定のフィードバックを収集します。スキーマは、MLflow UI を使用して作成することも、SDK を使用してプログラムで作成することもできます。

注記

スキーマ名は、現在の MLflow エクスペリメント内で一意である必要があることに注意してください。各スキーマの目的を明確に示すわかりやすい名前を選択します。

UI を使用したスキーマの作成

MLflow UI の ラベリング タブに移動して、スキーマを視覚的に作成します。これにより、コードを記述せずに質問、入力タイプ、および検証ルールを定義するための直感的なインターフェイスが提供されます。

人間のフィードバック

プログラムによるスキーマの作成

すべてのスキーマには、名前、型、タイトル、および入力仕様が必要です。

基本的なスキーマ作成

Python
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import InputCategorical, InputText

# Create a feedback schema for rating response quality
quality_schema = schemas.create_label_schema(
    name="response_quality",
    type="feedback",
    title="How would you rate the overall quality of this response?",
    input=InputCategorical(options=["Poor", "Fair", "Good", "Excellent"]),
    instruction="Consider accuracy, relevance, and helpfulness when rating."
)

スキーマの種類

次の 2 つのスキーマタイプから選択します。

feedback : 評価、好み、意見などの主観的な評価
expectation :正しい答えや期待される行動などの客観的なグラウンドトゥルース

Python
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import InputCategorical, InputTextList

# Feedback schema for subjective assessment
tone_schema = schemas.create_label_schema(
    name="response_tone",
    type="feedback",
    title="Is the response tone appropriate for the context?",
    input=InputCategorical(options=["Too formal", "Just right", "Too casual"]),
    enable_comment=True  # Allow additional comments
)

# Expectation schema for ground truth
facts_schema = schemas.create_label_schema(
    name="required_facts",
    type="expectation",
    title="What facts must be included in a correct response?",
    input=InputTextList(max_count=5, max_length_each=200),
    instruction="List key facts that any correct response must contain."
)

ラベリングスキーマの管理

SDK 関数を使用して、スキーマをプログラムで管理します。

スキーマの取得

Python
import mlflow.genai.label_schemas as schemas

# Get an existing schema
schema = schemas.get_label_schema("response_quality")
print(f"Schema: {schema.name}")
print(f"Type: {schema.type}")
print(f"Title: {schema.title}")

スキーマの更新

Python
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import InputCategorical

# Update by recreating with overwrite=True
updated_schema = schemas.create_label_schema(
    name="response_quality",
    type="feedback",
    title="Rate the response quality (updated question)",
    input=InputCategorical(options=["Excellent", "Good", "Fair", "Poor", "Very Poor"]),
    instruction="Updated: Focus on factual accuracy above all else.",
    overwrite=True  # Replace existing schema
)

スキーマの削除

Python
import mlflow.genai.label_schemas as schemas

# Remove a schema that's no longer needed
schemas.delete_label_schema("old_schema_name")

カスタムスキーマの入力タイプ

MLflow では、さまざまな種類のフィードバックを収集するために、次の 5 つの入力の種類がサポートされています。

単一選択ドロップダウン(`InputCategorical`)

相互に排他的なオプションに使用します。

Python
from mlflow.genai.label_schemas import InputCategorical

# Rating scale
rating_input = InputCategorical(
    options=["1 - Poor", "2 - Below Average", "3 - Average", "4 - Good", "5 - Excellent"]
)

# Binary choice
safety_input = InputCategorical(options=["Safe", "Unsafe"])

# Multiple categories
error_type_input = InputCategorical(
    options=["Factual Error", "Logical Error", "Formatting Error", "No Error"]
)

複数選択ドロップダウン(`InputCategoricalList`)

複数のオプションを選択できる場合に使用します。

Python
from mlflow.genai.label_schemas import InputCategoricalList

# Multiple error types can be present
errors_input = InputCategoricalList(
    options=[
        "Factual inaccuracy",
        "Missing context",
        "Inappropriate tone",
        "Formatting issues",
        "Off-topic content"
    ]
)

# Multiple content types
content_input = InputCategoricalList(
    options=["Technical details", "Examples", "References", "Code samples"]
)

フリーフォームテキスト (`InputText`)

自由回答に使用します。

Python
from mlflow.genai.label_schemas import InputText

# General feedback
feedback_input = InputText(max_length=500)

# Specific improvement suggestions
improvement_input = InputText(
    max_length=200  # Limit length for focused feedback
)

# Short answers
summary_input = InputText(max_length=100)

複数のテキストエントリ (`InputTextList`)

テキスト項目のリストを収集するために使用します。

Python
from mlflow.genai.label_schemas import InputTextList

# List of factual errors
errors_input = InputTextList(
    max_count=10,        # Maximum 10 errors
    max_length_each=150  # Each error description limited to 150 chars
)

# Missing information
missing_input = InputTextList(
    max_count=5,
    max_length_each=200
)

# Improvement suggestions
suggestions_input = InputTextList(max_count=3)  # No length limit per item

数値入力 (`InputNumeric`)

数値評価またはスコアに使用します。

Python
from mlflow.genai.label_schemas import InputNumeric

# Confidence score
confidence_input = InputNumeric(
    min_value=0.0,
    max_value=1.0
)

# Rating scale
rating_input = InputNumeric(
    min_value=1,
    max_value=10
)

# Cost estimate
cost_input = InputNumeric(min_value=0)  # No maximum limit

完全な例

顧客サービスの評価

以下は、顧客サービスのレスポンスを評価するための包括的な例です。

Python
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import (
    InputCategorical,
    InputCategoricalList,
    InputText,
    InputTextList,
    InputNumeric
)

# Overall quality rating
quality_schema = schemas.create_label_schema(
    name="service_quality",
    type="feedback",
    title="Rate the overall quality of this customer service response",
    input=InputCategorical(options=["Excellent", "Good", "Average", "Poor", "Very Poor"]),
    instruction="Consider helpfulness, accuracy, and professionalism.",
    enable_comment=True
)

# Issues identification
issues_schema = schemas.create_label_schema(
    name="response_issues",
    type="feedback",
    title="What issues are present in this response? (Select all that apply)",
    input=InputCategoricalList(options=[
        "Factually incorrect information",
        "Unprofessional tone",
        "Doesn't address the question",
        "Too vague or generic",
        "Contains harmful content",
        "No issues identified"
    ]),
    instruction="Select all issues you identify. Choose 'No issues identified' if the response is problem-free."
)

# Expected resolution steps
resolution_schema = schemas.create_label_schema(
    name="expected_resolution",
    type="expectation",
    title="What steps should be included in the ideal resolution?",
    input=InputTextList(max_count=5, max_length_each=200),
    instruction="List the key steps a customer service rep should take to properly resolve this issue."
)

# Confidence in assessment
confidence_schema = schemas.create_label_schema(
    name="assessment_confidence",
    type="feedback",
    title="How confident are you in your assessment?",
    input=InputNumeric(min_value=1, max_value=10),
    instruction="Rate from 1 (not confident) to 10 (very confident)"
)

医療情報レビュー

医療情報の回答を評価する例:

Python
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import InputCategorical, InputTextList, InputNumeric

# Safety assessment
safety_schema = schemas.create_label_schema(
    name="medical_safety",
    type="feedback",
    title="Is this medical information safe and appropriate?",
    input=InputCategorical(options=[
        "Safe - appropriate general information",
        "Concerning - may mislead patients",
        "Dangerous - could cause harm if followed"
    ]),
    instruction="Assess whether the information could be safely consumed by patients."
)

# Required disclaimers
disclaimers_schema = schemas.create_label_schema(
    name="required_disclaimers",
    type="expectation",
    title="What medical disclaimers should be included?",
    input=InputTextList(max_count=3, max_length_each=300),
    instruction="List disclaimers that should be present (e.g., 'consult your doctor', 'not professional medical advice')."
)

# Accuracy of medical facts
accuracy_schema = schemas.create_label_schema(
    name="medical_accuracy",
    type="feedback",
    title="Rate the factual accuracy of the medical information",
    input=InputNumeric(min_value=0, max_value=100),
    instruction="Score from 0 (completely inaccurate) to 100 (completely accurate)"
)

ラベリングセッションとの統合

作成したら、ラベル付けセッションでスキーマを使用します。

Python
import mlflow.genai.label_schemas as schemas

# Schemas are automatically available when creating labeling sessions
# The Review App will present questions based on your schema definitions

# Example: Using schemas in a session (conceptual - actual session creation
# happens through the Review App UI or other APIs)
session_schemas = [
    "service_quality",      # Your custom schema
    "response_issues",      # Your custom schema
    schemas.EXPECTED_FACTS  # Built-in schema
]

ベストプラクティス

スキーマ設計

明確なタイトル : 質問を明確で具体的なプロンプトとして記述します
役立つ手順 : ガイドのレビュー担当者にコンテキストを提供する
適切な制約 : テキストの長さとリストの数に妥当な制限を設定します
論理オプション : カテゴリカル入力の場合、オプションが相互に排他的で包括的であることを確認します

スキーマ管理

一貫した名前付け : スキーマ全体で説明的で一貫性のある名前を使用します
バージョン管理 : スキーマを更新するときは、既存のセッションへの影響を考慮してください
クリーンアップ : 未使用のスキーマを削除して、ワークスペースを整理します

入力タイプの選択

標準化された評価または分類に InputCategorical を使用する
複数の問題や機能が存在する可能性がある場合は、 InputCategoricalList を使用します
詳細な説明やカスタムフィードバックには、 InputText を使用してください
構造化されたアイテムのリストに InputTextList を使用する
InputNumericを使用して、正確なスコアリングまたは信頼性の評価を実現

次のステップ

既存のトレースにラベルを付ける - スキーマを適用して構造化されたフィードバックを収集します
ラベリングセッションの作成 - スキーマを使用してレビューワークフローを整理します
評価データセットの構築 - ラベル付きデータをテストデータセットに変換します

ラベリングスキーマのしくみ​

一般的なユースケースのラベリングのスキーマ​

一般的なユースケースのスキーマの作成​

カスタムラベリングスキーマの作成​

UI を使用したスキーマの作成​

プログラムによるスキーマの作成​

基本的なスキーマ作成​

スキーマの種類​

ラベリングスキーマの管理​

スキーマの取得​

スキーマの更新​

スキーマの削除​

カスタムスキーマの入力タイプ​

単一選択ドロップダウン(InputCategorical)​

複数選択ドロップダウン(InputCategoricalList)​

フリーフォームテキスト (InputText)​

複数のテキストエントリ (InputTextList)​

数値入力 (InputNumeric)​

完全な例​

顧客 サービス の評価​

医療情報レビュー​

ラベリングセッションとの統合​

ベストプラクティス​

スキーマ設計​

スキーマ管理​

入力タイプの選択​

次のステップ​