Skip to main content

Supervisor API (Beta)

Beta

This feature is in Beta. Account admins can control access to this feature from the Previews page. See Manage Databricks previews.

The Supervisor API simplifies building custom agents on Databricks with support for background mode for long-running tasks. You define the model, tools, and instructions in one request to an OpenResponses-compatible endpoint (POST /mlflow/v1/responses), and Databricks runs the agent loop for you: repeatedly calling the model, selecting and executing tools, and synthesizing a final response.

There are three approaches to build a customized tool-calling agent on Databricks:

  • Agent Bricks Supervisor Agent (recommended): Fully declarative with human feedback optimization for highest quality.
  • Supervisor API: Build a custom agent programmatically—choose models at runtime, control which tools to use per request, or iterate during development. Also the right choice when you need control over model choice while offloading agent loop management to Databricks.
  • AI Gateway unified or native APIs: Write your own agent loop. Databricks provides only the LLM inference layer. Use unified APIs where possible to enable switching models, or provider-specific native APIs (/openai, /anthropic, /gemini) when porting existing code to Databricks or using provider-specific features.

Requirements

Step 1: Create a single-turn LLM call

Start with a basic call with no tools. The DatabricksOpenAI client automatically configures the base URL and authentication for your workspace:

Python
from databricks_openai import DatabricksOpenAI

client = DatabricksOpenAI(use_ai_gateway=True)

response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
stream=False
)

print(response.output_text)

Step 2: Add hosted tools to run the agent loop

When you include tools in the request, Databricks manages a multi-turn loop on your behalf: the model decides which tools to call, Databricks executes them, feeds the results back to the model, and repeats until the model produces a final answer.

Python
response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Summarize recent customer reviews and flag any urgent issues."}],
tools=[
{
"type": "genie_space",
"genie_space": {
"id": "<genie-space-id>",
"description": "Answers customer review questions using SQL"
}
},
{
"type": "uc_function",
"uc_function": {
"name": "<catalog>.<schema>.<function_name>",
"description": "Flags a review as requiring urgent attention"
}
},
{
"type": "knowledge_assistant",
"knowledge_assistant": {
"knowledge_assistant_id": "<knowledge-assistant-id>",
"description": "Answers questions from internal documentation"
}
},
{
"type": "app",
"app": {
"name": "<app-name>",
"description": "Custom application endpoint"
}
},
{
"type": "uc_connection",
"uc_connection": {
"name": "<uc-connection-name>",
"description": "Searches the web for current information"
}
},
],
stream=True
)

for event in response:
print(event)

Step 3: Enable tracing

Pass a trace_destination in the request body to send traces from the agent loop to Unity Catalog tables. Each request generates a trace capturing the full sequence of model calls and tool executions. If you don't set trace_destination, no traces are written. For setup details, see Store MLflow traces in Unity Catalog.

Using the databricks-openai Python client, pass it via extra_body:

Python
response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
tools=[...],
extra_body={
"trace_destination": {
"catalog_name": "<catalog>",
"schema_name": "<schema>",
"table_prefix": "<table-prefix>"
}
}
)

To also return the trace directly in the API response, pass "databricks_options": {"return_trace": True} in extra_body.

You can also use MLflow distributed tracing to combine traces from your application code and the Supervisor API agent loop into a single end-to-end trace. Propagate trace context headers using the extra_headers field:

Python
import mlflow
from mlflow.tracing import get_tracing_context_headers_for_http_request

with mlflow.start_span("client-root") as root_span:
root_span.set_inputs({"input": "Tell me about Databricks"})

trace_headers = get_tracing_context_headers_for_http_request()

response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
tools=[...],
extra_body={
"trace_destination": {
"catalog_name": "<catalog>",
"schema_name": "<schema>",
"table_prefix": "<table-prefix>"
}
},
extra_headers=trace_headers,
)

Background mode

Background mode enables you to run long-running agent workflows that involve multiple tool calls and complex reasoning without waiting for them to finish synchronously. Submit your request with background=True, receive a response ID immediately, and poll for the result when it's ready. This is especially useful for agents that query multiple data sources or chain several tools together in a single request.

Create a background request

Python
response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
tools=[...],
background=True,
)

print(response.id) # Use this ID to poll for the result
print(response.status) # "queued" or "in_progress"

Poll for the result

Use responses.retrieve() to check the status until it reaches a terminal state:

Python
from time import sleep

while response.status in {"queued", "in_progress"}:
sleep(2)
response = client.responses.retrieve(response.id)

print(response.output_text)

Background mode with MCP

For security, the Supervisor API requires explicit user approval before executing any MCP tool call in background mode. When the agent loop selects an MCP tool, the response completes with an mcp_approval_request. You can review the tool name, server label, and arguments the model intends to pass:

JSON
{
"type": "mcp_approval_request",
"id": "<tool-call-id>",
"arguments": "{\"query\": \"what is Databricks\", \"count\": 5}",
"name": "you-search",
"server_label": "<server-label>",
"status": "completed"
}

To approve the tool call and continue the agent loop, pass an mcp_approval_response back in the input field with the full conversation history:

JSON
{
"type": "mcp_approval_response",
"id": "<tool-call-id>",
"approval_request_id": "<tool-call-id>",
"approve": true
}
note

Background mode responses are retained in the database for a maximum of 30 days.

Supported tools

You define tools in the tools array of your request. Each entry specifies a type and a configuration object with the same key. For example, a Genie space tool has "type": "genie_space" and a "genie_space": {...} object. The API supports the following tool types:

Tool type

Description

Scope

genie_space

Queries a Genie space to answer questions about your data. Parameters: id, description.

genie

uc_function

Calls a Unity Catalog function as an agent tool. Parameters: name, description.

unity-catalog

uc_connection

Connects to an external MCP server through a Unity Catalog connection. Parameters: name, description. Note: custom MCP servers on Apps are not yet supported.

unity-catalog

app

Calls a Databricks App endpoint. Parameters: name, description.

apps

knowledge_assistant

Calls an Knowledge Assistant endpoint. Parameters: knowledge_assistant_id, description.

model-serving

Supported parameters

Each request to the Supervisor API accepts the following parameters.

  • input: the conversation messages to send.
  • tools: hosted tool definitions (genie_space, uc_function, knowledge_assistant, app, uc_connection).
  • instructions: a system prompt to guide the supervisor's behavior.
  • stream: set to true to stream responses.
  • background: set to true to run the request asynchronously. Returns a response ID that you poll with responses.retrieve(). See Background mode.
  • trace_destination: optional object with catalog_name, schema_name, and table_prefix fields. When set, the Supervisor API writes a trace of the full agent loop to the specified Unity Catalog tables. Pass via extra_body in the Python client.

The API doesn't support inference parameters such as temperature. The server manages these internally.

Limitations

The Supervisor API has the following limitations:

  • Background mode runtime: Background mode requests have a maximum execution time of 30 minutes.
  • Client-side function calling: Only hosted tools are supported. You can't pass function tool definitions for the client to execute, and you can't mix hosted tools with client-side function tools in the same request.
  • Streaming in background mode: stream and background can't both be true in the same request.
  • Durable execution: Automatic recovery from failures or interruptions with exactly-once execution guarantees for the agent loop is not supported.
  • Databricks Apps OBO not supported: On-behalf-of-user authorization is not supported for the Supervisor API. To use the Supervisor API in Databricks Apps, use system authorization and grant permissions for your tools.

Next steps