Query a deployed Mosaic AI agent
Learn how to use the MLflow deployments client or the Databricks OpenAI Client to send requests to deployed agents.
To learn how to deploy agents, see Deploy an agent for generative AI applications.
MLflow deployments client (recommended)
Databricks recommends that you use the MLflow deployment client to query your endpoint. The MLflow deployment client provides the following benefits:
- Allows you to provide optional custom inputs.
- Allows you to request MLflow traces.
- The
predict
andpredict_stream
methods of the deployments client match the behavior of the authored agent.
The following example shows you how to query an agent using the MLflow deployment client. Replace the content of messages
with a query specific to your agent and replace <agent-endpoint-name
with your endpoint name. If your agent accepts custom inputs, include them in the input Python dictionary. See Custom inputs and outputs.
from mlflow.deployments import get_deploy_client
client = get_deploy_client()
input_example = {
"messages": [{"role": "user", "content": "What does Databricks do?"}],
## Optional: Include any custom inputs
## "custom_inputs": {"id": 5},
"databricks_options": {"return_trace": True},
}
endpoint = "<agent-endpoint-name>"
After formatting the request, run client.predict()
for non-streaming responses or client.predict_stream()
for streaming responses. predict()
and predict_stream()
call the agent functions you defined when authoring the agent. See Author streaming output agents.
## Call predict for non-streaming responses
response = client.predict(endpoint=endpoint, inputs=input_ex)
## Call predict_stream for streaming responses
streaming_response = client.predict_stream(endpoint=endpoint, inputs=input_ex)
Databricks OpenAI Client
Alternatively, you can use the Databricks OpenAI Client to query a deployed agent. The Databricks OpenAI Client only supports conversational chat use cases. This means that you can only send and receive messages. You cannot use the Databricks OpenAI Client to include custom inputs or request traces from the endpoint.
The following example shows you how to submit a query using the Databricks OpenAI Client. Replace the content of messages
with a query specific to your agent and replace <agent-endpoint-name
with your endpoint name.
from databricks.sdk import WorkspaceClient
messages = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>"
w = WorkspaceClient()
client = w.serving_endpoints.get_open_ai_client()
After formatting the request, run chat.completions.create()
. Include the parameter stream=True
for streaming responses. chat.completion.create()
will invoke the predict()
or predict_stream()
functions you defined when authoring the agent. See Author streaming output agents.
## Run for non-streaming responses
response = client.chat.completions.create(model=endpoint, messages=messages)
## Include stream=True for streaming responses
streaming_response = client.chat.completions.create(
model=endpoint, messages=msgs, stream=True
)