Query an agent deployed on Databricks

Learn how to send requests to agents deployed to Databricks Apps or Model Serving endpoints. Databricks provides multiple query methods to fit different use cases and integration needs.

Select the query approach that best fits your use case:

Method	Key benefits
Databricks OpenAI Client (Recommended)	Native integration, full feature support, streaming capabilities
REST API	OpenAI-compatible, language-agnostic, works with existing tools
AI Functions: `ai_query`	OpenAI-compatible, query legacy agents hosted on Model Serving endpoints only

Databricks recommends the Databricks OpenAI Client for new applications. Choose the REST API when integrating with platforms that expect OpenAI-compatible endpoints.

Databricks OpenAI Client (Recommended)

Databricks recommends that you use the DatabricksOpenAI Client to query a deployed agent. Depending on the API of your deployed agent, you will either use the responses or chat completions client:

Agents deployed to Apps
Agents on Model Serving

Use the following example for agents hosted on Databricks Apps following the ResponsesAgent interface, which is the recommended approach for building agents. You must use a Databricks OAuth token to query agents hosted on Databricks Apps.

Python
from databricks.sdk import WorkspaceClient
from databricks_openai import DatabricksOpenAI

input_msgs = [{"role": "user", "content": "What does Databricks do?"}]
app_name = "<agent-app-name>"  # TODO: update this with your app name

# The WorkspaceClient must be configured with OAuth authentication
# See: https://docs.databricks.com/aws/en/dev-tools/auth/oauth-u2m.html
w = WorkspaceClient()

client = DatabricksOpenAI(workspace_client=w)

# Run for non-streaming responses. Calls the "invoke" method
# Include the "apps/" prefix in the model name
response = client.responses.create(model=f"apps/{app_name}", input=input_msgs)
print(response)

# Include stream=True for streaming responses. Calls the "stream" method
# Include the "apps/" prefix in the model name
streaming_response = client.responses.create(
    model=f"apps/{app_name}", input=input_msgs, stream=True
)
for chunk in streaming_response:
    print(chunk)

If you want to pass in custom_inputs, you can add them with the extra_body param:

Python
streaming_response = client.responses.create(
    model=f"apps/{app_name}",
    input=input_msgs,
    stream=True,
    extra_body={
        "custom_inputs": {"id": 5},
    },
)
for chunk in streaming_response:
    print(chunk)

Use the following example for legacy agents hosted on Model Serving following the ResponsesAgent interface. You can use either a Databricks OAuth token or a Personal Access Token (PAT) to query agents hosted on Model Serving.

Python
from databricks_openai import DatabricksOpenAI

input_msgs = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

client = DatabricksOpenAI()

# Run for non-streaming responses. Invokes `predict`
response = client.responses.create(model=endpoint, input=input_msgs)
print(response)

# Include stream=True for streaming responses. Invokes `predict_stream`
streaming_response = client.responses.create(model=endpoint, input=input_msgs, stream=True)
for chunk in streaming_response:
  print(chunk)

If you want to pass in custom_inputs or databricks_options, you can add them with the extra_body param:

Python
streaming_response = client.responses.create(
    model=endpoint,
    input=input_msgs,
    stream=True,
    extra_body={
        "custom_inputs": {"id": 5},
        "databricks_options": {"return_trace": True},
    },
)
for chunk in streaming_response:
    print(chunk)

Use the following example for legacy agents on model serving following the ChatAgent or ChatModel interfaces.

Python
from databricks.sdk import WorkspaceClient

messages = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

ws_client = WorkspaceClient()
client = ws_client.serving_endpoints.get_open_ai_client()

# Run for non-streaming responses. Invokes `predict`
response = client.chat.completions.create(model=endpoint, messages=messages)
print(response)

# Include stream=True for streaming responses. Invokes `predict_stream`
streaming_response = client.chat.completions.create(model=endpoint, messages=messages, stream=True)
for chunk in streaming_response:
  print(chunk)

If you want to pass in custom_inputs or databricks_options, you can add them with the extra_body param:

Python
streaming_response = client.chat.completions.create(
    model=endpoint,
    messages=messages,
    stream=True,
    extra_body={
        "custom_inputs": {"id": 5},
        "databricks_options": {"return_trace": True},
    },
)
for chunk in streaming_response:
    print(chunk)

REST API

The Databricks REST API provides endpoints for models that are OpenAI-compatible. This allows you to use Databricks agents to serve applications that require OpenAI interfaces.

This approach is ideal for:

Language-agnostic applications that use HTTP requests
Integrating with third-party platforms that expect OpenAI-compatible APIs
Migrating from OpenAI to Databricks with minimal code changes

Authenticate with the REST API using a Databricks OAuth token. Refer to the Databricks Authentication Documentation for more options and information.

Agents deployed to Apps
Agents on Model Serving

Bash
curl --request POST \
  --url <app-url>.databricksapps.com/responses \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "input": [{ "role": "user", "content": "hi" }],
    "stream": true
  }'

If you want to pass in custom_inputs, you can add them to the request body:

Bash
curl --request POST \
  --url <app-url>.databricksapps.com/responses \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "input": [{ "role": "user", "content": "hi" }],
    "stream": true,
    "custom_inputs": { "id": 5 }
  }'

Using the Databricks OpenAI Client with responses.create.
Sending a POST request to the specific endpoint's URL (ex: https://<host.databricks.com>/serving-endpoints/\<model-name\>/invocations). For more information, see your endpoint's Model Serving page and the Model Serving Documentation.

Bash
curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/responses \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "input": [{ "role": "user", "content": "hi" }],
    "stream": true
  }'

If you want to pass in custom_inputs or databricks_options, you can add them to the request body:

Bash
curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/responses \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "input": [{ "role": "user", "content": "hi" }],
    "stream": true,
    "custom_inputs": { "id": 5 },
    "databricks_options": { "return_trace": true }
  }'

Use the following for agents created with legacy ChatAgent or ChatModel interfaces. This is equivalent to:

Using the Databricks OpenAI Client with chat.completions.create.
Sending a POST request to the specific endpoint's URL (ex: https://<host.databricks.com>/serving-endpoints/\<model-name\>/invocations). For more information, see your endpoint's Model Serving page and the Model Serving Documentation.

Bash
curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/chat/completions \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "messages": [{ "role": "user", "content": "hi" }],
    "stream": true
  }'

If you want to pass in custom_inputs or databricks_options, you can add them to the request body:

Bash
curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/chat/completions \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "messages": [{ "role": "user", "content": "hi" }],
    "stream": true,
    "custom_inputs": { "id": 5 },
    "databricks_options": { "return_trace": true }
  }'

AI Functions: `ai_query`

You can use ai_query to query a deployed agent hosted on model serving using SQL. See ai_query function for SQL syntax and parameter definitions.

SQL
SELECT ai_query(
  "<model name>", question
) FROM (VALUES ('what is MLflow?'), ('how does MLflow work?')) AS t(question);

Next steps

Monitor GenAI in production

Databricks OpenAI Client (Recommended)​

REST API​

AI Functions: ai_query​

Next steps​

Databricks OpenAI Client (Recommended)

REST API

AI Functions: `ai_query`

Next steps