Query model services
This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.
This page describes how to query model services in Unity Catalog.
Requirements
- Unity AI Gateway preview enabled for your account. See Manage Databricks previews.
- A Databricks workspace in a Unity AI Gateway supported region.
EXECUTEon the model service, andUSE CATALOGandUSE SCHEMAon its catalog and schema. System-provided model services insystem.aigrantEXECUTEto all account users by default.
Identify a model service
Identify a model service by its fully qualified name as the model slug, for example system.ai.databricks-claude-opus-4-6. You can query a model service from any workspace attached to the same metastore, including across workspace boundaries.
Each request must identify a workspace, which Databricks uses for pay-per-token billing. Provide the workspace in one of the following ways:
- Workspace URL: Send the request to your workspace URL, which identifies the workspace. For example,
https://<workspace-url>/ai-gateway/mlflow/v1. - Workspace header: If you send the request to a single account-level URL, add the
x-databricks-workspace-idheader to identify the workspace.
Query with the OpenAI client
The following example queries a model service using the OpenAI client and the MLflow Chat Completions API:
- Python
- REST API
from openai import OpenAI
import os
# To get a Databricks token, see https://docs.databricks.com/dev-tools/auth/pat
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = OpenAI(
api_key=DATABRICKS_TOKEN,
base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)
chat_completion = client.chat.completions.create(
messages=[
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"},
],
model="system.ai.databricks-claude-opus-4-6",
max_tokens=256
)
print(chat_completion.choices[0].message.content)
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "system.ai.databricks-claude-opus-4-6",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"}
]
}' \
https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions
Replace <workspace-url> with your Databricks workspace URL.
Model services support the same unified and native APIs as Unity AI Gateway endpoints, such as the MLflow Chat Completions API and the Anthropic Messages API. For the full list of supported APIs and more examples, see Query Unity AI Gateway endpoints (legacy).
Query with ai_query
Use the ai_query function to query Databricks-provided model services in system.ai from SQL or Python for batch inference:
SELECT ai_query(
'system.ai.databricks-claude-opus-4-6',
'Summarize the following text: ' || text_column
) AS summary
FROM my_table
LIMIT 10
For full ai_query syntax, see ai_query function.
Backward compatibility with workspace endpoint names
For backward compatibility, Databricks interprets requests that use a Databricks-hosted model name without a fully qualified name as a system-provided model service in system.ai. For example, Databricks interprets databricks-claude-opus-4-6 as system.ai.databricks-claude-opus-4-6. This behavior lets existing workloads continue to run without code changes.