Skip to main content

Query an agent deployed on Databricks

Learn how to send requests to agents deployed to Databricks Apps or Model Serving endpoints. Databricks provides multiple query methods to fit different use cases and integration needs.

Select the query approach that best fits your use case:

Method

Key benefits

Databricks OpenAI Client (Recommended)

Native integration, full feature support, streaming capabilities

REST API

OpenAI-compatible, language-agnostic, works with existing tools

AI Functions: ai_query

OpenAI-compatible, query legacy agents hosted on Model Serving endpoints only

Databricks recommends the Databricks OpenAI Client for new applications. Choose the REST API when integrating with platforms that expect OpenAI-compatible endpoints.

Databricks recommends that you use the DatabricksOpenAI Client to query a deployed agent. Depending on the API of your deployed agent, you will either use the responses or chat completions client:

Use the following example for agents hosted on Databricks Apps following the ResponsesAgent interface, which is the recommended approach for building agents. You must use a Databricks OAuth token to query agents hosted on Databricks Apps.

Python
from databricks.sdk import WorkspaceClient
from databricks_openai import DatabricksOpenAI

input_msgs = [{"role": "user", "content": "What does Databricks do?"}]
app_name = "<agent-app-name>" # TODO: update this with your app name

# The WorkspaceClient must be configured with OAuth authentication
# See: https://docs.databricks.com/aws/en/dev-tools/auth/oauth-u2m.html
w = WorkspaceClient()

client = DatabricksOpenAI(workspace_client=w)

# Run for non-streaming responses. Calls the "invoke" method
# Include the "apps/" prefix in the model name
response = client.responses.create(model=f"apps/{app_name}", input=input_msgs)
print(response)

# Include stream=True for streaming responses. Calls the "stream" method
# Include the "apps/" prefix in the model name
streaming_response = client.responses.create(
model=f"apps/{app_name}", input=input_msgs, stream=True
)
for chunk in streaming_response:
print(chunk)

If you want to pass in custom_inputs, you can add them with the extra_body param:

Python
streaming_response = client.responses.create(
model=f"apps/{app_name}",
input=input_msgs,
stream=True,
extra_body={
"custom_inputs": {"id": 5},
},
)
for chunk in streaming_response:
print(chunk)

REST API

The Databricks REST API provides endpoints for models that are OpenAI-compatible. This allows you to use Databricks agents to serve applications that require OpenAI interfaces.

This approach is ideal for:

  • Language-agnostic applications that use HTTP requests
  • Integrating with third-party platforms that expect OpenAI-compatible APIs
  • Migrating from OpenAI to Databricks with minimal code changes

Authenticate with the REST API using a Databricks OAuth token. Refer to the Databricks Authentication Documentation for more options and information.

Use the following example for agents hosted on Databricks Apps following the ResponsesAgent interface, which is the recommended approach for building agents. You must use a Databricks OAuth token to query agents hosted on Databricks Apps.

Bash
curl --request POST \
--url <app-url>.databricksapps.com/responses \
--header 'Authorization: Bearer <OAuth token>' \
--header 'content-type: application/json' \
--data '{
"input": [{ "role": "user", "content": "hi" }],
"stream": true
}'

If you want to pass in custom_inputs, you can add them to the request body:

Bash
curl --request POST \
--url <app-url>.databricksapps.com/responses \
--header 'Authorization: Bearer <OAuth token>' \
--header 'content-type: application/json' \
--data '{
"input": [{ "role": "user", "content": "hi" }],
"stream": true,
"custom_inputs": { "id": 5 }
}'

AI Functions: ai_query

You can use ai_query to query a deployed agent hosted on model serving using SQL. See ai_query function for SQL syntax and parameter definitions.

SQL
SELECT ai_query(
"<model name>", question
) FROM (VALUES ('what is MLflow?'), ('how does MLflow work?')) AS t(question);

Next steps

Monitor GenAI in production