Supervisor API (Beta)

Beta

This feature is in Beta. Workspace admins can enable this feature from the Previews page. See Manage Databricks previews.

The Supervisor API simplifies building custom agents on Databricks with support for background mode for long-running tasks. You define the model, tools, and instructions in one request to an OpenResponses-compatible endpoint (POST ai-gateway/mlflow/v1/responses), and Databricks runs the agent loop for you: repeatedly calling the model, selecting and executing tools, and synthesizing a final response.

There are three approaches to build a customized tool-calling agent on Databricks:

Agent Bricks Supervisor Agent (recommended): Fully declarative with human feedback optimization for highest quality.
Supervisor API: Build a custom agent programmatically—choose models at runtime, control which tools to use per request, or iterate during development. Also the right choice when you need control over model choice while offloading agent loop management to Databricks.
AI Gateway unified or native APIs: Write your own agent loop. Databricks provides only the LLM inference layer. Use unified APIs where possible to enable switching models, or provider-specific native APIs (/openai, /anthropic, /gemini) when porting existing code to Databricks or using provider-specific features.

Requirements

Unity AI Gateway for agents and LLMs enabled for your account. See Manage Databricks previews.
- Because the Supervisor API runs through Unity AI Gateway, AI Gateway features such as inference tables, rate limits, and fallbacks apply. Usage tracking is not supported in this beta.
Store OpenTelemetry traces in Unity Catalog enabled for your account. See Manage Databricks previews.
- Stores traces from the Supervisor API agent loop in Unity Catalog tables.
A Databricks workspace in a supported region.
Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.
The tools you pass (Genie Spaces, Unity Catalog functions, MCP servers, knowledge assistants, Apps) must already be configured and accessible.
The databricks-openai package installed: pip install databricks-openai

Step 1: Create a single-turn LLM call

Start with a basic call with no tools. The DatabricksOpenAI client automatically configures the base URL and authentication for your workspace:

Python
from databricks_openai import DatabricksOpenAI

client = DatabricksOpenAI(use_ai_gateway=True)

response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
  stream=False
)

print(response.output_text)

Step 2: Add hosted tools to run the agent loop

When you include tools in the request, Databricks manages a multi-turn loop on your behalf: the model decides which tools to call, Databricks executes them, feeds the results back to the model, and repeats until the model produces a final answer.

Python
response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "Summarize recent customer reviews and flag any urgent issues."}],
  tools=[
    {
      "type": "genie_space",
      "name": "Customer reviews",
      "description": "Answers customer review questions using SQL",
      "genie_space": {"space_id": "<genie-space-id>"}
    },
    {
      "type": "dashboard",
      "name": "Customer reviews dashboard",
      "description": "Answers questions about the customer reviews dashboard",
      "dashboard": {"dashboard_id": "<dashboard-id>"}
    },
    {
      "type": "uc_function",
      "name": "Flag urgent review",
      "description": "Flags a review as requiring urgent attention",
      "uc_function": {"name": "<catalog>.<schema>.<function_name>"}
    },
    {
      "type": "table",
      "table": {
        "name": "<catalog>.<schema>.<table_name>",
        "description": "Reads from the customer reviews table"
      }
    },
    {
      "type": "vector_search_index",
      "vector_search_index": {
        "name": "<catalog>.<schema>.<index_name>",
        "description": "Searches the product documentation index for relevant passages"
      }
    },
    {
      "type": "knowledge_assistant",
      "name": "Internal docs",
      "description": "Answers questions from internal documentation",
      "knowledge_assistant": {"knowledge_assistant_id": "<knowledge-assistant-id>"}
    },
    {
      "type": "serving_endpoint",
      "name": "Custom agent",
      "description": "Calls a custom agent served from a Databricks model serving endpoint",
      "serving_endpoint": {"name": "<serving-endpoint-name>"}
    },
    {
      "type": "vector_search_index",
      "name": "Product docs",
      "description": "Looks up product documentation by semantic search",
      "vector_search_index": {
        "name": "<catalog>.<schema>.<index>",
        "columns": ["title", "content"]
      }
    },
    {
      "type": "app",
      "name": "Support agent",
      "description": "Custom application endpoint",
      "app": {"name": "<app-name>"}
    },
    {
      "type": "uc_connection",
      "name": "GitHub",
      "description": "Searches GitHub for issues and pull requests",
      "uc_connection": {"name": "<uc-connection-name>"}
    },
    {
      "type": "web_search",
      "name": "Web search",
      "description": "Searches the public web for current information and returns a synthesized answer with citations",
      "web_search": {}
    },
    {
      "type": "volume",
      "volume": {
        "name": "<catalog>.<schema>.<volume>",
        "description": "Searches files in a Unity Catalog volume"
      }
    },
  ],
  stream=True
)

for event in response:
  print(event)

Step 3 (Optional): Connect to third party services with system managed connections

Databricks provides system managed connections for popular third party services such as Google Drive, GitHub, Atlassian, SharePoint, and Glean. These connections are a quick alternative to setting up your own external MCP server — you can still use the uc_connection tool type to connect to any external MCP server you've configured yourself.

System managed connections require the Third Party Connectors for Agents Beta to be enabled in your workspace. See Manage Databricks previews.

The following connectors are supported:

Connector	Description
`system_ai_agent_google_drive`	Search and read files from Google Drive.
`system_ai_agent_github_mcp`	Access GitHub repositories, issues, and pull requests.
`system_ai_agent_atlassian_mcp`	Search and manage Atlassian resources (Jira, Confluence).
`system_ai_agent_sharepoint`	Search and read files from SharePoint.
`system_ai_agent_glean_mcp`	Search across enterprise content indexed by Glean.

Pass a connector in the tools array using the uc_connection tool type with the name field set to the connector name:

Python
response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "List my open GitHub pull requests."}],
  tools=[
    {
      "type": "uc_connection",
      "uc_connection": {
        "name": "system_ai_agent_github_mcp"
      }
    }
  ],
)

User-to-machine (U2M) authentication

Each user authenticates individually. OAuth tokens are not shared between users. On the first request that uses a connector the user hasn't authenticated, the response completes with status: "failed" and an oauth error containing a login URL:

JSON
{
  "status": "failed",
  "error": {
    "code": "oauth",
    "message": "Failed request to <connector>. Please login first at <login-url>."
  }
}

Open the URL in a browser, complete the OAuth flow, then re-run the same request.

Step 4 (Optional): Add a client-side function tool

Use function tools when you want your application to execute custom logic alongside Databricks-hosted tools. Declare a function tool with type: "function", a name, an optional description, and a JSON Schema parameters object:

Python
response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "<user prompt>"}],
  tools=[
    {
      "type": "function",
      "name": "<client-side-function-name>",
      "description": "<description of what this function does>",
      "parameters": {
        "type": "object",
        "properties": {"<param-name>": {"type": "string"}},
        "required": ["<param-names>"],
        "additionalProperties": False,
      },
    }
  ],
)

The Supervisor API doesn't store conversation state between requests, so a client-side function call takes two turns:

Turn 1. The model returns a function_call item (for example, "call get_weather with location=Paris") instead of a final answer.
Your code runs the function locally and produces a result.
Turn 2. Call responses.create() again, passing the original input plus the model's function_call plus a new function_call_output with your result. The model uses the result to produce the final answer.

Client-side function tool example

Python
import json
from databricks_openai import DatabricksOpenAI

client = DatabricksOpenAI(use_ai_gateway=True)
MODEL = "databricks-claude-sonnet-4-5"

GET_WEATHER = {
    "type": "function",
    "name": "get_weather",
    "description": "Get the current weather for a location.",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"],
        "additionalProperties": False,
    },
}

def run_get_weather(args):
    return json.dumps({
        "location": args["location"],
        "temp_c": 18,
        "condition": "sunny",
    })

CLIENT_TOOLS = {"get_weather": run_get_weather}
TOOLS = [GET_WEATHER]

input_list = [{"role": "user", "content": "What's the weather in Paris?"}]

# Turn 1 — model emits a function_call
resp = client.responses.create(model=MODEL, input=input_list, tools=TOOLS)

# Echo the model's turn into history, then execute pending client function_calls
input_list += [item.model_dump() for item in resp.output]
for item in resp.output:
    if item.type == "function_call" and item.name in CLIENT_TOOLS:
        args = json.loads(item.arguments)
        # Execute the client-side function with the model's arguments
        # and append the result so the model can use it on the next turn.
        tool_output = CLIENT_TOOLS[item.name](args)
        input_list.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": tool_output,
        })

# Turn 2 — model produces the final answer using the tool result
final = client.responses.create(model=MODEL, input=input_list, tools=TOOLS)
print(final.output_text)

For more patterns (streaming, hosted plus client tools, MCP approval, troubleshooting), see the Supervisor API client-side function calling skill.

Step 5: Enable tracing

Pass a trace_destination in the request body to send traces from the agent loop to Unity Catalog tables. Each request generates a trace capturing the full sequence of model calls and tool executions. If you don't set trace_destination, no traces are written. For setup details, see Store OpenTelemetry traces in Unity Catalog.

Using the databricks-openai Python client, pass it via extra_body:

Python
response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
  tools=[...],
  extra_body={
    "trace_destination": {
      "catalog_name": "<catalog>",
      "schema_name": "<schema>",
      "table_prefix": "<table-prefix>"
    }
  }
)

To also return the trace directly in the API response, pass "databricks_options": {"return_trace": True} in extra_body.

You can also use MLflow distributed tracing to combine traces from your application code and the Supervisor API agent loop into a single end-to-end trace. Propagate trace context headers using the extra_headers field:

Python
import mlflow
from mlflow.tracing import get_tracing_context_headers_for_http_request

with mlflow.start_span("client-root") as root_span:
  root_span.set_inputs({"input": "Tell me about Databricks"})

  trace_headers = get_tracing_context_headers_for_http_request()

  response = client.responses.create(
    model="databricks-claude-sonnet-4-5",
    input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
    tools=[...],
    extra_body={
      "trace_destination": {
        "catalog_name": "<catalog>",
        "schema_name": "<schema>",
        "table_prefix": "<table-prefix>"
      }
    },
    extra_headers=trace_headers,
  )

Background mode

Background mode enables you to run long-running agent workflows that involve multiple tool calls and complex reasoning without waiting for them to finish synchronously. Submit your request with background=True, receive a response ID immediately, and poll for the result when it's ready. This is especially useful for agents that query multiple data sources or chain several tools together in a single request.

Create a background request

Python
response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
  tools=[...],
  background=True,
)

print(response.id)     # Use this ID to poll for the result
print(response.status) # "queued" or "in_progress"

Poll for the result

Use responses.retrieve() to check the status until it reaches a terminal state:

Python
from time import sleep

while response.status in {"queued", "in_progress"}:
  sleep(2)
  response = client.responses.retrieve(response.id)

print(response.output_text)

Background mode with MCP

For security, the Supervisor API requires explicit user approval before executing any MCP tool call in background mode. When the agent loop selects an MCP tool, the response completes with an mcp_approval_request. You can review the tool name, server label, and arguments the model intends to pass:

JSON
{
  "type": "mcp_approval_request",
  "id": "<tool-call-id>",
  "arguments": "{\"query\": \"what is Databricks\", \"count\": 5}",
  "name": "you-search",
  "server_label": "<server-label>",
  "status": "completed"
}

To approve the tool call and continue the agent loop, pass an mcp_approval_response back in the input field with the full conversation history:

JSON
{
  "type": "mcp_approval_response",
  "id": "<tool-call-id>",
  "approval_request_id": "<tool-call-id>",
  "approve": true
}

note

Background mode responses are retained in the database for a maximum of 30 days.

Supported tools

You define tools in the tools array of your request. Every tool object shares three top-level fields:

type (string, required): The discriminator that selects the tool type.
name (string, optional): Display name shown to the model.
description (string, optional): Hint to the model about when to call this tool.

In addition, every tool object carries a nested configuration object whose key matches the type value. The table below documents the nested configuration for each supported tool type.

Tool type	Example	Scope
`genie_space`	JSON `{ "type": "genie_space", "name": "Customer reviews", "genie_space": { "space_id": "<id>" } }`	`genie`
`dashboard`	JSON `{ "type": "dashboard", "name": "Sales dashboard", "dashboard": { "dashboard_id": "<id>" } }`	`dashboards`
`uc_function`	JSON `{ "type": "uc_function", "name": "Flag urgent review", "uc_function": { "name": "<catalog>.<schema>.<function>" } }`	`unity-catalog`
`table`	JSON `{ "type": "table", "name": "Customer reviews", "table": { "name": "<catalog>.<schema>.<table_name>" } }`	`unity-catalog`
`knowledge_assistant`	JSON `{ "type": "knowledge_assistant", "name": "Internal docs", "knowledge_assistant": { "knowledge_assistant_id": "<id>" } }`	`model-serving`
`serving_endpoint`	JSON `{ "type": "serving_endpoint", "name": "Custom agent", "serving_endpoint": { "name": "<endpoint-name>" } }`	`model-serving`
`web_search`	JSON `{ "type": "web_search", "name": "Web search", "web_search": {} }`	`model-serving`
`vector_search_index`	JSON `{ "type": "vector_search_index", "name": "Product docs", "vector_search_index": { "name": "<catalog>.<schema>.<index>", "columns": ["title", "content"] } }`	`vector-search`
`volume`	JSON `{ "type": "volume", "volume": { "name": "<catalog>.<schema>.<volume>", "description": "Searches files in a Unity Catalog volume" } }`	`unity-catalog`
`app`	JSON `{ "type": "app", "name": "Support agent", "app": { "name": "<app-name>" } }`	`apps`
`uc_connection`	JSON `{ "type": "uc_connection", "name": "GitHub", "uc_connection": { "name": "system_ai_agent_github_mcp" } }`	`unity-catalog`
`function`	JSON `{ "type": "function", "name": "get_weather", "description": "Get the current weather for a location.", "parameters": { "type": "object", "properties": { "location": { "type": "string" } }, "required": ["location"] } }`	None

For serving_endpoint, only ResponseAgent, ChatCompletions, and ChatAgent endpoints are supported.

For app, only MCP apps (with the mcp- prefix) and custom ResponseAgent apps (with the agent- prefix) are supported.

For uc_connection, use the connection name you've created for an external MCP server, or a system_ai_agent_* system-managed connector (see Step 3 (Optional): Connect to third party services with system managed connections). Custom MCP servers on Apps are not yet supported.

Code execution

When a request needs computation, the Supervisor runs model-generated code in a sandboxed serverless compute session to analyze data, transform files, or run calculations. It supports Python (default), SQL, and shell commands. The Supervisor writes and runs the code itself when needed, so you don't enable, configure, or supply the code.

Code execution runs in a locked-down sandbox with:

No internet access. It blocks all outbound network egress regardless of your workspace's network policy, so code running in the sandbox can't reach external endpoints.
Scoped Databricks access only. It has no data access of its own. It can read the Unity Catalog tables you declare with the table tool in the same request.

Supported parameters

Each request to the Supervisor API accepts the following parameters.

model: one of the following supported models. Change this field to switch providers without changing the rest of your code.
- Claude-Haiku-4.5 (databricks-claude-haiku-4-5)
- Claude-Opus-4.1 (databricks-claude-opus-4-1)
- Claude-Opus-4.5 (databricks-claude-opus-4-5)
- Claude-Opus-4.6 (databricks-claude-opus-4-6)
- Claude-Sonnet-4 (databricks-claude-sonnet-4)
- Claude-Sonnet-4.5 (databricks-claude-sonnet-4-5)
- Claude-Sonnet-4.6 (databricks-claude-sonnet-4-6)

GPT-5 (databricks-gpt-5)
GPT-5.1 (databricks-gpt-5-1)
GPT-5.2 (databricks-gpt-5-2)
GPT-5.4 (databricks-gpt-5-4)

input: the conversation messages to send.
tools: hosted tool definitions (genie_space, dashboard, uc_function, table, knowledge_assistant, serving_endpoint, web_search, vector_search_index, volume, app, uc_connection) and client-side function tools (function). See Step 4 (Optional): Add a client-side function tool.
instructions: a system prompt to guide the supervisor's behavior.
stream: set to true to stream responses.
background: set to true to run the request asynchronously. Returns a response ID that you poll with responses.retrieve(). See Background mode.
trace_destination: optional object with catalog_name, schema_name, and table_prefix fields. When set, the Supervisor API writes a trace of the full agent loop to the specified Unity Catalog tables. Pass via extra_body in the Python client.

The API doesn't support inference parameters such as temperature. The server manages these internally.

Limitations

The Supervisor API has the following limitations:

Background mode runtime: Background mode requests have a maximum execution time of 30 minutes.
Streaming in background mode: stream and background can't both be true in the same request.
Durable execution: Automatic recovery from failures or interruptions with exactly-once execution guarantees for the agent loop is not supported.
Databricks Apps OBO not supported: On-behalf-of-user authorization is not supported for the Supervisor API. To use the Supervisor API in Databricks Apps, use system authorization and grant permissions for your tools.
Web search workspace eligibility: The web_search tool is not available on workspaces with HIPAA/BAA compliance enabled. It is only available in regions with a web-search-capable native model or cross-geography processing enabled. Requests that include web_search from ineligible workspaces are rejected.

Requirements​

Step 1: Create a single-turn LLM call​

Step 2: Add hosted tools to run the agent loop​

Step 3 (Optional): Connect to third party services with system managed connections​

User-to-machine (U2M) authentication​

Step 4 (Optional): Add a client-side function tool​

Step 5: Enable tracing​

Background mode​

Create a background request​

Poll for the result​

Background mode with MCP​

Supported tools​

Code execution​

Supported parameters​

Limitations​

Next steps​