チュートリアル: OpenAI モデルに対してクエリを実行するための外部モデルエンドポイントを作成する

この記事では、 MLflow Deployments SDK を使用して、入力候補、チャット、埋め込みの OpenAI モデルを提供する外部モデルエンドポイントを構成し、クエリを実行するための詳細な手順について説明します。外部モデルの詳細については、こちらをご覧ください。

サービング UI を使用してこのタスクを実行する場合は、「外部モデルサービングエンドポイントを作成する」を参照してください。

必要条件

Databricks Runtime 13.0 ML以降。
MLflow 2.9 以降。
OpenAI API キー。
Databricks CLI バージョン 0.205 以降をインストールします。

(オプション)手順 0: Databricks シークレット CLI を使用して OpenAI API キーを保存する

API キーは、手順 3 でプレーンテキスト文字列として指定することも、Databricks シークレットを使用して指定することもできます。

OpenAI API キーをシークレットとして保存するには、Databricks Secrets CLI (バージョン 0.205 以降) を使用できます。シークレットに REST API を使用することもできます。

次の例では、 my_openai_secret_scopeという名前のシークレットスコープを作成し、そのスコープにシークレット openai_api_key を作成します。

databricks secrets create-scope my_openai_secret_scope
databricks secrets put-secret my_openai_secret_scope openai_api_key

手順 1: 外部モデルをサポートする MLflow をインストールする

外部モデルをサポートする MLflow バージョンをインストールするには、次を使用します。

sh
%pip install mlflow[genai]>=2.9.0

ステップ 2: 外部モデルエンドポイントを作成して管理する

important

このセクションのコード例では、パブリックプレビュー MLflow デプロイ CRUD SDK の使用方法を示します。

大規模言語モデル (LLM) の外部モデルエンドポイントを作成するには、MLflow Deployments SDK の create_endpoint() メソッドを使用します。また、Serving UI で外部モデルエンドポイントを作成することもできます。

次のコードスニペットは、設定の served_entities セクションで指定されているように、OpenAI gpt-3.5-turbo-instructの入力完了エンドポイントを作成します。エンドポイントについては、各フィールドの一意の値を name と openai_api_key に入力してください。

Python
import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")
client.create_endpoint(
    name="openai-completions-endpoint",
    config={
        "served_entities": [{
            "name": "openai-completions",
            "external_model": {
                "name": "gpt-3.5-turbo-instruct",
                "provider": "openai",
                "task": "llm/v1/completions",
                "openai_config": {
                    "openai_api_key": "{{secrets/my_openai_secret_scope/openai_api_key}}"
                }
            }
        }]
    }
)

次のコードスニペットは、上記と同じ完了エンドポイントを作成する別の方法として、OpenAI API キーをプレーンテキスト文字列として指定する方法を示しています。

Python
import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")
client.create_endpoint(
    name="openai-completions-endpoint",
    config={
        "served_entities": [{
            "name": "openai-completions",
            "external_model": {
                "name": "gpt-3.5-turbo-instruct",
                "provider": "openai",
                "task": "llm/v1/completions",
                "openai_config": {
                    "openai_api_key_plaintext": "sk-yourApiKey"
                }
            }
        }]
    }
)

Azure OpenAI を使用している場合は、Azure OpenAI デプロイ名、エンドポイント URL、API バージョンを openai_config セクション。

Python
client.create_endpoint(
    name="openai-completions-endpoint",
    config={
        "served_entities": [
          {
            "name": "openai-completions",
            "external_model": {
                "name": "gpt-3.5-turbo-instruct",
                "provider": "openai",
                "task": "llm/v1/completions",
                "openai_config": {
                    "openai_api_type": "azure",
                    "openai_api_key": "{{secrets/my_openai_secret_scope/openai_api_key}}",
                    "openai_api_base": "https://my-azure-openai-endpoint.openai.azure.com",
                    "openai_deployment_name": "my-gpt-35-turbo-deployment",
                    "openai_api_version": "2023-05-15"
                },
            },
          }
        ],
    },
)

エンドポイントを更新するには、 update_endpoint()を使用します。次のコードスニペットは、エンドポイントのレート制限をユーザーあたり 1 分あたり 20 回の呼び出しに更新する方法を示しています。

Python
client.update_endpoint(
    endpoint="openai-completions-endpoint",
    config={
        "rate_limits": [
            {
                "key": "user",
                "renewal_period": "minute",
                "calls": 20
            }
        ],
    },
)

ステップ 3: 外部モデルエンドポイントにリクエストを送信する

important

このセクションのコード例では、MLflow Deployments SDK の predict() メソッドの使用方法を示します。

MLflow Deployments SDK の predict() メソッドを使用して、チャット、入力候補、埋め込みの要求を外部モデルエンドポイントに送信できます。

以下は、OpenAIがホストする gpt-3.5-turbo-instruct にリクエストを送信します。

Python
completions_response = client.predict(
    endpoint="openai-completions-endpoint",
    inputs={
        "prompt": "What is the capital of France?",
        "temperature": 0.1,
        "max_tokens": 10,
        "n": 2
    }
)
completions_response == {
    "id": "cmpl-8QW0hdtUesKmhB3a1Vel6X25j2MDJ",
    "object": "text_completion",
    "created": 1701330267,
    "model": "gpt-3.5-turbo-instruct",
    "choices": [
        {
            "text": "The capital of France is Paris.",
            "index": 0,
            "finish_reason": "stop",
            "logprobs": None
        },
        {
            "text": "Paris is the capital of France",
            "index": 1,
            "finish_reason": "stop",
            "logprobs": None
        },
    ],
    "usage": {
        "prompt_tokens": 7,
        "completion_tokens": 16,
        "total_tokens": 23
    }
}

ステップ 4: 別のプロバイダーのモデルを比較する

モデルサービングは、Open AI、Anthropic、Cohere、Amazon Bedrock、Google Cloud Vertex AI など、多くの外部モデルプロバイダをサポートしています。プロバイダー間で LLM を比較できるため、 AI Playground を使用してアプリケーションの精度、速度、コストを最適化するのに役立ちます。

次の例では、Anthropic claude-2 のエンドポイントを作成し、その応答を OpenAI gpt-3.5-turbo-instructを使用する質問と比較します。どちらの回答も標準形式が同じであるため、簡単に比較できます。

Anthropic claude-2 のエンドポイントを作成する

Python
import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")

client.create_endpoint(
    name="anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "name": "claude-completions",
                "external_model": {
                    "name": "claude-2",
                    "provider": "anthropic",
                    "task": "llm/v1/completions",
                    "anthropic_config": {
                        "anthropic_api_key": "{{secrets/my_anthropic_secret_scope/anthropic_api_key}}"
                    },
                },
            }
        ],
    },
)

各エンドポイントからの応答を比較する

Python

openai_response = client.predict(
    endpoint="openai-completions-endpoint",
    inputs={
        "prompt": "How is Pi calculated? Be very concise."
    }
)
anthropic_response = client.predict(
    endpoint="anthropic-completions-endpoint",
    inputs={
        "prompt": "How is Pi calculated? Be very concise."
    }
)
openai_response["choices"] == [
    {
        "text": "Pi is calculated by dividing the circumference of a circle by its diameter."
                " This constant ratio of 3.14159... is then used to represent the relationship"
                " between a circle's circumference and its diameter, regardless of the size of the"
                " circle.",
        "index": 0,
        "finish_reason": "stop",
        "logprobs": None
    }
]
anthropic_response["choices"] == [
    {
        "text": "Pi is calculated by approximating the ratio of a circle's circumference to"
                " its diameter. Common approximation methods include infinite series, infinite"
                " products, and computing the perimeters of polygons with more and more sides"
                " inscribed in or around a circle.",
        "index": 0,
        "finish_reason": "stop",
        "logprobs": None
    }
]

追加のリソース

Mosaic AI Model Servingの外部モデル。

必要条件​

(オプション)手順 0: Databricks シークレット CLI を使用して OpenAI API キーを保存する​

手順 1: 外部モデルをサポートする MLflow をインストールする​

ステップ 2: 外部モデルエンドポイントを作成して管理する​

ステップ 3: 外部モデルエンドポイントにリクエストを送信する​

ステップ 4: 別のプロバイダーのモデルを比較する​

Anthropic claude-2 のエンドポイントを作成する​

各エンドポイントからの応答を比較する​

追加のリソース​