Query a deployed Mosaic AI agent
Learn how to send requests to agents deployed to a Model Serving endpoint. Databricks provides multiple query methods to fit different use cases and integration needs.
To learn how to deploy agents, see Deploy an agent for generative AI applications.
Select the query approach that best fits your use case:
Method | Key benefits |
---|---|
Databricks OpenAI Client (Recommended) | Native integration, full feature support, streaming capabilities |
MLflow deployments client | Existing MLflow patterns, established ML pipelines |
REST API | OpenAI-compatible, language-agnostic, works with existing tools |
Databricks recommends the Databricks OpenAI Client for new applications. Choose the REST API when integrating with platforms that expect OpenAI-compatible endpoints.
Databricks OpenAI Client (Recommended)
Databricks recommends that you use the Databricks OpenAI Client to query a deployed agent. Depending on the API of your deployed agent, you will either use the responses or chat completions client:
- ResponsesAgent endpoints
- ChatAgent or ChatModel endpoints
Use the following example for agents created with the ResponsesAgent interface, which is the recommended approach for building agents.
from databricks.sdk import WorkspaceClient
input_msgs = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name
w = WorkspaceClient()
client = w.serving_endpoints.get_open_ai_client()
## Run for non-streaming responses. Invokes `predict`
response = client.responses.create(model=endpoint, input=input_msgs)
print(response)
## Include stream=True for streaming responses. Invokes `predict_stream`
streaming_response = client.responses.create(model=endpoint, input=input_msgs, stream=True)
for chunk in streaming_response:
print(chunk)
If you want to pass in custom_inputs
or databricks_options
, you can add them with the extra_body
param:
streaming_response = client.responses.create(
model=endpoint,
input=input_msgs,
stream=True,
extra_body={
"custom_inputs": {"id": 5},
"databricks_options": {"return_trace": True},
},
)
for chunk in streaming_response:
print(chunk)
Use the following example for agents created with legacy ChatAgent or ChatModel interfaces, which are still supported but not recommended for new agents.
from databricks.sdk import WorkspaceClient
messages = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name
w = WorkspaceClient()
client = w.serving_endpoints.get_open_ai_client()
## Run for non-streaming responses. Invokes `predict`
response = client.chat.completions.create(model=endpoint, messages=messages)
print(response)
## Include stream=True for streaming responses. Invokes `predict_stream`
streaming_response = client.chat.completions.create(model=endpoint, messages=messages, stream=True)
for chunk in streaming_response:
print(chunk)
If you want to pass in custom_inputs
or databricks_options
, you can add them with the extra_body
param:
streaming_response = client.chat.completions.create(
model=endpoint,
messages=messages,
stream=True,
extra_body={
"custom_inputs": {"id": 5},
"databricks_options": {"return_trace": True},
},
)
for chunk in streaming_response:
print(chunk)
MLflow deployments client
Use the MLflow deployments client when working within existing MLflow workflows and pipelines. This approach integrates naturally with MLflow tracking and experiment management.
The following examples show you how to query an agent using the MLflow deployment client. For new applications, Databricks recommends using the Databricks OpenAI Client for its enhanced features and native integration.
Depending on the API of your deployed agent, you will either use the ResponsesAgent or ChatAgent format:
- ResponsesAgent endpoints
- ChatAgent or ChatModel endpoints
Use the following example for agents created with the ResponsesAgent interface, which is the recommended approach for building agents.
from mlflow.deployments import get_deploy_client
client = get_deploy_client()
input_example = {
"input": [{"role": "user", "content": "What does Databricks do?"}],
## Optional: Include any custom inputs
## "custom_inputs": {"id": 5},
"databricks_options": {"return_trace": True},
}
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name
## Call predict for non-streaming responses
response = client.predict(endpoint=endpoint, inputs=input_example)
## Call predict_stream for streaming responses
streaming_response = client.predict_stream(endpoint=endpoint, inputs=input_example)
Use this for agents created with legacy ChatAgent or ChatModel interfaces, which are still supported but not recommended for new agents.
from mlflow.deployments import get_deploy_client
client = get_deploy_client()
input_example = {
"messages": [{"role": "user", "content": "What does Databricks do?"}],
## Optional: Include any custom inputs
## "custom_inputs": {"id": 5},
"databricks_options": {"return_trace": True},
}
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name
## Call predict for non-streaming responses
response = client.predict(endpoint=endpoint, inputs=input_example)
## Call predict_stream for streaming responses
streaming_response = client.predict_stream(endpoint=endpoint, inputs=input_example)
client.predict()
and client.predict_stream()
call the agent functions you defined when authoring the agent. See Streaming responses.
REST API
The Databricks REST API provides endpoints for models that are OpenAI-compatible. This allows you to use Databricks agents to serve applications that require OpenAI interfaces.
This approach is ideal for:
- Language-agnostic applications that use HTTP requests
- Integrating with third-party platforms that expect OpenAI-compatible APIs
- Migrating from OpenAI to Databricks with minimal code changes
Authenticate with the REST API using a Databricks OAuth token or Personal Access Token (PAT). The examples below use a Databricks OAuth token, refer to the Databricks Authentication Documentation for more options and information.
- ResponsesAgent endpoints
- ChatAgent or ChatModel endpoints
Use the following example for agents created with the ResponsesAgent interface, which is the recommended approach for building agents. REST API call is equivalent to:
- Using the Databricks OpenAI Client with
responses.create
. - Sending a POST request to the specific endpoint's URL (ex:
https://<host.databricks.com>/serving-endpoints/\<model-name\>/invocations
). Find more details in your endpoint's model serving page and the Model Serving Documentation.
curl --request POST \
--url https://<host.databricks.com\>/serving-endpoints/responses \
--header 'Authorization: Bearer <OAuth token>' \
--header 'content-type: application/json' \
--data '{
"model": "\<model-name\>",
"input": [{ "role": "user", "content": "hi" }],
"stream": true
}'
If you want to pass in custom_inputs
or databricks_options
, you can add them with the extra_body
param:
curl --request POST \
--url https://<host.databricks.com\>/serving-endpoints/responses \
--header 'Authorization: Bearer <OAuth token>' \
--header 'content-type: application/json' \
--data '{
"model": "\<model-name\>",
"input": [{ "role": "user", "content": "hi" }],
"stream": true,
"extra_body": {
"custom_inputs": { "id": 5 },
"databricks_options": { "return_trace": true }
}
}'
Use this for agents created with legacy ChatAgent or ChatModel interfaces, which are still supported but not recommended for new agents. This is equivalent to:
- Using the Databricks OpenAI Client with
chat.completions.create
. - Sending a POST request to the specific endpoint's URL (ex:
https://<host.databricks.com>/serving-endpoints/\<model-name\>/invocations
). Find more details in your endpoint's model serving page and the Model Serving Documentation.
curl --request POST \
--url https://<host.databricks.com\>/serving-endpoints/chat/completions \
--header 'Authorization: Bearer <OAuth token>' \
--header 'content-type: application/json' \
--data '{
"model": "\<model-name\>",
"messages": [{ "role": "user", "content": "hi" }],
"stream": true
}'
If you want to pass in custom_inputs
or databricks_options
, you can add them with the extra_body
param:
curl --request POST \
--url https://<host.databricks.com\>/serving-endpoints/chat/completions \
--header 'Authorization: Bearer <OAuth token>' \
--header 'content-type: application/json' \
--data '{
"model": "\<model-name\>",
"messages": [{ "role": "user", "content": "hi" }],
"stream": true,
"extra_body": {
"custom_inputs": { "id": 5 },
"databricks_options": { "return_trace": true }
}
}'