デプロイされた Mosaic AI エージェントのクエリ

モデルサービングエンドポイントにデプロイされたエージェントにリクエストを送信する方法を学習します。Databricks は、さまざまなユースケースや統合ニーズに合わせて複数のクエリメソッドを提供します。

エージェントをデプロイする方法については、「生成AI アプリケーション用のエージェントをデプロイする」を参照してください。

ユースケースに最適なクエリアプローチを選択します。

手法	主なメリット
Databricks OpenAI クライアント (推奨)	ネイティブ統合、フル機能サポート、ストリーミング機能
MLflow デプロイメントクライアント	既存のMLflowパターン、確立された機械学習パイプライン
REST API	OpenAI互換、言語非依存、既存のツールと連携
AI Functions ： `ai_query`	OpenAIと互換性があり、既存のツールと連携します

Databricks では、新しいアプリケーションには Databricks OpenAI クライアント を推奨しています。OpenAI 互換のエンドポイントを期待するプラットフォームと統合する場合は、 REST API を選択します。

Databricks OpenAI クライアント (推奨)

Databricks では、デプロイされたエージェントをクエリするにはDatabricks OpenAI クライアントを使用することをお勧めします。デプロイされたエージェントの API に応じて、応答クライアントまたはチャット完了クライアントのいずれかを使用します。

ResponsesAgent endpoints
ChatAgent or ChatModel endpoints

エージェントの構築には、 ResponsesAgent インターフェースを使用して作成されたエージェントに次の例を使用します。これは、エージェントの構築に推奨される方法です。

Python
from databricks.sdk import WorkspaceClient

input_msgs = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

w = WorkspaceClient()
client = w.serving_endpoints.get_open_ai_client()

## Run for non-streaming responses. Invokes `predict`
response = client.responses.create(model=endpoint, input=input_msgs)
print(response)

## Include stream=True for streaming responses. Invokes `predict_stream`
streaming_response = client.responses.create(model=endpoint, input=input_msgs, stream=True)
for chunk in streaming_response:
  print(chunk)

custom_inputsまたはdatabricks_optionsを渡す場合は、 extra_bodyパラメータを使用して追加できます。

Python
streaming_response = client.responses.create(
    model=endpoint,
    input=input_msgs,
    stream=True,
    extra_body={
        "custom_inputs": {"id": 5},
        "databricks_options": {"return_trace": True},
    },
)
for chunk in streaming_response:
    print(chunk)

従来の ChatAgent または ChatModel インターフェースを使用して作成されたエージェントの場合は次の例を使用します。これらのインターフェースは引き続きサポートされていますが、新しいエージェントには推奨されません。

Python
from databricks.sdk import WorkspaceClient

messages = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

w = WorkspaceClient()
client = w.serving_endpoints.get_open_ai_client()

## Run for non-streaming responses. Invokes `predict`
response = client.chat.completions.create(model=endpoint, messages=messages)
print(response)

## Include stream=True for streaming responses. Invokes `predict_stream`
streaming_response = client.chat.completions.create(model=endpoint, messages=messages, stream=True)
for chunk in streaming_response:
  print(chunk)

custom_inputsまたはdatabricks_optionsを渡す場合は、 extra_bodyパラメータを使用して追加できます。

Python
streaming_response = client.chat.completions.create(
    model=endpoint,
    messages=messages,
    stream=True,
    extra_body={
        "custom_inputs": {"id": 5},
        "databricks_options": {"return_trace": True},
    },
)
for chunk in streaming_response:
    print(chunk)

MLflow デプロイメントクライアント

既存の MLflow ワークフローおよびパイプライン内で作業する場合は、MLflow デプロイメントクライアントを使用します。このアプローチは、 MLflow追跡およびエクスペリメント管理と自然に統合されます。

次の例は、MLflow デプロイメントクライアントを使用してエージェントをクエリする方法を示しています。新しいアプリケーションの場合、Databricks では、強化された機能とネイティブ統合のために Databricks OpenAI クライアントの使用を推奨しています。

デプロイされたエージェントの API に応じて、ResponsesAgent または ChatAgent 形式のいずれかを使用します。

ResponsesAgent endpoints
ChatAgent or ChatModel endpoints

Python
from mlflow.deployments import get_deploy_client

client = get_deploy_client()
input_example = {
    "input": [{"role": "user", "content": "What does Databricks do?"}],
    ## Optional: Include any custom inputs
    ## "custom_inputs": {"id": 5},
    "databricks_options": {"return_trace": True},
}
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

## Call predict for non-streaming responses
response = client.predict(endpoint=endpoint, inputs=input_example)

## Call predict_stream for streaming responses
streaming_response = client.predict_stream(endpoint=endpoint, inputs=input_example)

従来の ChatAgent または ChatModel インターフェースで作成されたエージェントにこれを使用します。これらのインターフェースは引き続きサポートされていますが、新しいエージェントには推奨されません。

Python
from mlflow.deployments import get_deploy_client

client = get_deploy_client()
input_example = {
    "messages": [{"role": "user", "content": "What does Databricks do?"}],
    ## Optional: Include any custom inputs
    ## "custom_inputs": {"id": 5},
    "databricks_options": {"return_trace": True},
}
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

## Call predict for non-streaming responses
response = client.predict(endpoint=endpoint, inputs=input_example)

## Call predict_stream for streaming responses
streaming_response = client.predict_stream(endpoint=endpoint, inputs=input_example)

client.predict()とclient.predict_stream() 、エージェントの作成時に定義したエージェント関数を呼び出します。ストリーミング応答を参照してください。

REST API

Databricks REST API は、OpenAI と互換性のあるモデルのエンドポイントを提供します。これにより、Databricks エージェントを使用して、OpenAI インターフェースを必要とするアプリケーションを提供できるようになります。

このアプローチは次のような場合に最適です。

HTTPリクエストを使用する言語に依存しないアプリケーション
OpenAI互換APIsを期待するサードパーティプラットフォームとの統合
最小限のコード変更でOpenAIからDatabricksに移行する

Databricks OAuthまたは Personal ACCESS (PAT) を使用して、 REST APIで認証します。以下の例では、Databricks OAuth トークンを使用しています。詳細なオプションと情報については、Databricks 認証ドキュメントを参照してください。

ResponsesAgent endpoints
ChatAgent or ChatModel endpoints

エージェントの構築には、 ResponsesAgent インターフェースを使用して作成されたエージェントに次の例を使用します。これは、エージェントの構築に推奨される方法です。REST API 呼び出しは次のものと同等です:

responses.createで Databricks OpenAI クライアントを使用します。
特定のエンドポイントの URL (例: https://<host.databricks.com>/serving-endpoints/\<model-name\>/invocations ) に POST リクエストを送信します。詳細については、エンドポイントのモデルサービングページおよびモデルサービングドキュメントを参照してください。

Bash
curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/responses \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "input": [{ "role": "user", "content": "hi" }],
    "stream": true
  }'

custom_inputsまたはdatabricks_optionsを渡す場合は、 extra_bodyパラメータを使用して追加できます。

Bash
curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/responses \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "input": [{ "role": "user", "content": "hi" }],
    "stream": true,
    "extra_body": {
      "custom_inputs": { "id": 5 },
      "databricks_options": { "return_trace": true }
    }
  }'

従来の ChatAgent または ChatModel インターフェースで作成されたエージェントにこれを使用します。これらのインターフェースは引き続きサポートされていますが、新しいエージェントには推奨されません。これは次と同等です:

chat.completions.createで Databricks OpenAI クライアントを使用します。
特定のエンドポイントの URL (例: https://<host.databricks.com>/serving-endpoints/\<model-name\>/invocations ) に POST リクエストを送信します。詳細については、エンドポイントのモデルサービングページおよびモデルサービングドキュメントを参照してください。

Bash
curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/chat/completions \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "messages": [{ "role": "user", "content": "hi" }],
    "stream": true
  }'

custom_inputsまたはdatabricks_optionsを渡す場合は、 extra_bodyパラメータを使用して追加できます。

Bash
curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/chat/completions \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "messages": [{ "role": "user", "content": "hi" }],
    "stream": true,
    "extra_body": {
      "custom_inputs": { "id": 5 },
      "databricks_options": { "return_trace": true }
    }
  }'

AI Functions ： `ai_query`

ai_query使用すると、SQL を使用してデプロイされた AI エージェントをクエリできます。SQL構文とパラメーターの定義については、 ai_query関数を参照してください。

SQL
SELECT ai_query(
  "<model name>", question
) FROM (VALUES ('what is MLflow?'), ('how does MLflow work?')) AS t(question);

次のステップ

本番運用で GenAI を監視する

Databricks OpenAI クライアント (推奨)​

MLflow デプロイメント クライアント​

REST API​

AI Functions ： ai_query​

次のステップ​

Databricks OpenAI クライアント (推奨)

MLflow デプロイメントクライアント

REST API

AI Functions ： `ai_query`

次のステップ