Query model services

Beta

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.

This page describes how to query model services in Unity Catalog.

Requirements

Unity AI Gateway preview enabled for your account. See Manage Databricks previews.
A Databricks workspace in a Unity AI Gateway supported region.
EXECUTE on the model service, and USE CATALOG and USE SCHEMA on its catalog and schema. System-provided model services in system.ai grant EXECUTE to all account users by default.

Identify a model service

Identify a model service by its fully qualified name as the model slug, for example system.ai.databricks-claude-opus-4-6. You can query a model service from any workspace attached to the same metastore, including across workspace boundaries.

Each request must identify a workspace, which Databricks uses for pay-per-token billing. Provide the workspace in one of the following ways:

Workspace URL: Send the request to your workspace URL, which identifies the workspace. For example, https://<workspace-url>/ai-gateway/mlflow/v1.
Workspace header: If you send the request to a single account-level URL, add the x-databricks-workspace-id header to identify the workspace.

Query with the OpenAI client

The following example queries a model service using the OpenAI client and the MLflow Chat Completions API:

Python
REST API

Python
from openai import OpenAI
import os

# To get a Databricks token, see https://docs.databricks.com/dev-tools/auth/pat
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)

chat_completion = client.chat.completions.create(
  messages=[
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What is Databricks?"},
  ],
  model="system.ai.databricks-claude-opus-4-6",
  max_tokens=256
)

print(chat_completion.choices[0].message.content)

Bash
curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "system.ai.databricks-claude-opus-4-6",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hello! How can I assist you today?"},
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions

Replace <workspace-url> with your Databricks workspace URL.

Model services support the same unified and native APIs as Unity AI Gateway endpoints, such as the MLflow Chat Completions API and the Anthropic Messages API. For the full list of supported APIs and more examples, see Query Unity AI Gateway endpoints (legacy).

Query with `ai_query`

Use the ai_query function to query Databricks-provided model services in system.ai from SQL or Python for batch inference:

SQL
SELECT ai_query(
  'system.ai.databricks-claude-opus-4-6',
  'Summarize the following text: ' || text_column
) AS summary
FROM my_table
LIMIT 10

For full ai_query syntax, see ai_query function.

Backward compatibility with workspace endpoint names

For backward compatibility, Databricks interprets requests that use a Databricks-hosted model name without a fully qualified name as a system-provided model service in system.ai. For example, Databricks interprets databricks-claude-opus-4-6 as system.ai.databricks-claude-opus-4-6. This behavior lets existing workloads continue to run without code changes.

Requirements​

Identify a model service​

Query with the OpenAI client​

Query with ai_query​

Backward compatibility with workspace endpoint names​

Next steps​

Requirements

Identify a model service

Query with the OpenAI client

Query with `ai_query`

Backward compatibility with workspace endpoint names

Next steps