Query a deployed Mosaic AI agent

Learn how to use the MLflow deployments client or the Databricks OpenAI Client to send requests to deployed agents.

To learn how to deploy agents, see Deploy an agent for generative AI applications.

MLflow deployments client (recommended)

Databricks recommends that you use the MLflow deployment client to query your endpoint. The MLflow deployment client provides the following benefits:

Allows you to provide optional custom inputs.
Allows you to request MLflow traces.
The predict and predict_stream methods of the deployments client match the behavior of the authored agent.

The following example shows you how to query an agent using the MLflow deployment client. Replace the content of messages with a query specific to your agent and replace <agent-endpoint-name with your endpoint name. If your agent accepts custom inputs, include them in the input Python dictionary. See Custom inputs and outputs.

Python
from mlflow.deployments import get_deploy_client

client = get_deploy_client()
input_example = {
    "messages": [{"role": "user", "content": "What does Databricks do?"}],
    ## Optional: Include any custom inputs
    ## "custom_inputs": {"id": 5},
    "databricks_options": {"return_trace": True},
}
endpoint = "<agent-endpoint-name>"

After formatting the request, run client.predict() for non-streaming responses or client.predict_stream() for streaming responses. predict() and predict_stream() call the agent functions you defined when authoring the agent. See Author streaming output agents.

Python
## Call predict for non-streaming responses
response = client.predict(endpoint=endpoint, inputs=input_ex)

## Call predict_stream for streaming responses
streaming_response = client.predict_stream(endpoint=endpoint, inputs=input_ex)

Databricks OpenAI Client

Alternatively, you can use the Databricks OpenAI Client to query a deployed agent. The Databricks OpenAI Client only supports conversational chat use cases. This means that you can only send and receive messages. You cannot use the Databricks OpenAI Client to include custom inputs or request traces from the endpoint.

The following example shows you how to submit a query using the Databricks OpenAI Client. Replace the content of messages with a query specific to your agent and replace <agent-endpoint-name with your endpoint name.

Python
from databricks.sdk import WorkspaceClient

messages = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>"

w = WorkspaceClient()
client = w.serving_endpoints.get_open_ai_client()

After formatting the request, run chat.completions.create(). Include the parameter stream=True for streaming responses. chat.completion.create() will invoke the predict() or predict_stream() functions you defined when authoring the agent. See Author streaming output agents.

Python
## Run for non-streaming responses
response = client.chat.completions.create(model=endpoint, messages=messages)

## Include stream=True for streaming responses
streaming_response = client.chat.completions.create(
    model=endpoint, messages=msgs, stream=True
)

Next steps

Monitor apps deployed using Agent Framework

MLflow deployments client (recommended)​

Databricks OpenAI Client​

Next steps​

MLflow deployments client (recommended)

Databricks OpenAI Client

Next steps