MLflow プロンプト最適化 (ベータ)

備考

ベータ版

この機能は現在ベータ版です。

MLflow 、評価メトリクスとトレーニングデータを使用してプロンプトを自動的に改善できるmlflow.genai.optimize_prompts() APIを提供します。この機能を使用すると、プロンプトの最適化アルゴリズムを適用し、手作業を減らして一貫した品質を確保することで、あらゆるエージェントフレームワーク全体でプロンプトの有効性を高めることができます。

MLflow は、Mosaic Research Team によって研究および検証された GepaPromptOptimizerを通じて GEPA 最適化アルゴリズムをサポートします。GEPA は、 LLM主導のリフレクションと自動フィードバックを使用してプロンプトを繰り返し改良し、体系的なデータドリブンの改善につながります。

主なメリット

自動改善 : 手動チューニングを行わずに、評価メトリクスに基づいてプロンプトを最適化します。
データ ドリブン最適化 : トレーニングデータとカスタムスコアラーを使用して最適化をガイドします。
フレームワークに依存しない : あらゆるエージェントフレームワークで動作し、幅広い互換性を提供します。
共同最適化 : 複数のプロンプトを同時に改良して、全体的なパフォーマンスを最適化します。
柔軟な評価 : カスタムスコアラーと集計機能のサポートを提供します。
バージョン管理 : 最適化されたプロンプトをMLflowプロンプトレジストリに自動的に登録します。
拡張可能 : 基本クラスを拡張してカスタム最適化アルゴリズムをプラグインします。

備考

バージョン要件

optimize_prompts API には MLflow >= 3.5.0 が必要です。

プロンプト最適化の例

プロンプトの最適化の簡単な例については、「プロンプトの最適化」チュートリアルを参照してください。

API は、評価基準に応じてパフォーマンスが向上する改善されたプロンプトを生成します。

例: シンプルなプロンプト → 最適化されたプロンプト

最適化前:

Text
Answer this question: {{question}}

最適化後:

Text
Answer this question: {{question}}.
Focus on providing precise,
factual information without additional commentary or explanations.

1. **Identify the Subject**: Clearly determine the specific subject
of the question (e.g., geography, history)
and provide a concise answer.

2. **Clarity and Precision**: Your response should be a single,
clear statement that directly addresses the question.
Do not add extra details, context, or alternatives.

3. **Expected Format**: The expected output should be the exact answer
with minimal words where appropriate.
For instance, when asked about capitals, the answer should
simply state the name of the capital city,
e.g., "Tokyo" for Japan, "Rome" for Italy, and "Paris" for France.

4. **Handling Variations**: If the question contains multiple
parts or variations, focus on the primary query
 and answer it directly. Avoid over-complication.

5. **Niche Knowledge**: Ensure that the responses are based on
commonly accepted geographic and historical facts,
as this type of information is crucial for accuracy in your answers.

Adhere strictly to these guidelines to maintain consistency
and quality in your responses.

詳しい説明については、 MLflowのドキュメントをご覧ください。

高度な使用法

高度な使用例については以下のガイドを参照してください。

一般的な使用例

次のセクションでは、一般的な使用例のコード例を示します。

精度の向上

プロンプトを最適化して、より正確な出力を生成します。

Python
from mlflow.genai.scorers import Correctness


result = mlflow.genai.optimize_prompts(
    predict_fn=predict_fn,
    train_data=dataset,
    prompt_uris=[prompt.uri],
    optimizer=GepaPromptOptimizer(reflection_model="databricks:/databricks-gpt-5"),
    scorers=[Correctness(model="databricks:/databricks-claude-sonnet-4-5")],
)

安全性を最適化

出力が安全であることを確認する:

Python
from mlflow.genai.scorers import Safety


result = mlflow.genai.optimize_prompts(
    predict_fn=predict_fn,
    train_data=dataset,
    prompt_uris=[prompt.uri],
    optimizer=GepaPromptOptimizer(reflection_model="databricks:/databricks-claude-sonnet-4-5"),
    scorers=[Safety(model="databricks:/databricks-claude-sonnet-4-5")],
)

トラブルシューティング

次のセクションでは、一般的なエラーのトラブルシューティングのガイダンスを示します。

問題: 最適化に時間がかかりすぎる

解決策 : データセットのサイズを減らすか、オプティマイザーの予算を減らします。

Python
# Use fewer examples
small_dataset = dataset[:20]

# Use faster model for optimization
optimizer = GepaPromptOptimizer(
    reflection_model="databricks:/databricks-gpt-5-mini", max_metric_calls=100
)

問題: 改善が見られなかった

ソリューション : メトリクスの評価を確認し、次のようにデータセットの多様性を高めます。

スコアラーがあなたが重視する点を正確に測定できるようにします。
トレーニングデータのサイズと多様性を向上します。
オプティマイザーの設定を変更してみます。
出力の形式が期待どおりであることを確認します。

問題: プロンプトが使用されていない

解決策: predict_fn mlflow.entities.model_registry.PromptVersion.formatを呼び出すようにします。

Python
# ✅ Correct - loads from registry
def predict_fn(question: str):
    prompt = mlflow.genai.load_prompt(f"prompts:/{prompt_location}@latest)
    return llm_call(prompt.format(question=question))


# ❌ Incorrect - hardcoded prompt
def predict_fn(question: str):
    return llm_call(f"Answer: {question}")

次のステップ

API の詳細については、「プロンプトの最適化 (ベータ版)」を参照してください。

生成AI アプリケーションのトレースと評価の詳細については、次の記事を参照してください。

主なメリット​

プロンプト最適化の例​

例: シンプルなプロンプト → 最適化されたプロンプト​

高度な使用法​

一般的な使用例​

精度の向上​

安全性を最適化​

トラブルシューティング​

問題: 最適化に時間がかかりすぎる​

問題: 改善が見られなかった​

問題: プロンプトが使用されていない​

次のステップ​