Skip to main content

Query a chat model

Try the new Unity AI Gateway Beta

A new Unity AI Gateway experience is available in Beta. The new Unity AI Gateway is the enterprise control plane for governing LLM endpoints and coding agents with enhanced features. See Unity AI Gateway.

In this article, you learn how to write query requests for foundation models that are optimized for chat and general purpose tasks and served by Unity AI Gateway.

prompt

Genie Code (Agent mode) can do this for you. Try this example prompt:

Query the databricks-claude-sonnet-4-5 chat model using the OpenAI client. Send a system prompt and a user question, and print the response.

The examples in this article apply to querying foundation models that are made available using either:

Requirements

Query examples

note

The following examples are based on Unity AI Gateway and model services. If you use model serving endpoints instead of model services, replace the model service name with an endpoint name. See Databricks-hosted foundation models available in Foundation Model APIs for a list of available foundation models and their model service and endpoint names.

The examples in this section show how to query a Foundation Model API pay-per-token model service using the different client options.

For a batch inference example, see Enrich data using AI Functions.

To use the OpenAI client, specify the model service name as the model input.

Python
from databricks_openai import DatabricksOpenAI

client = DatabricksOpenAI()

response = client.chat.completions.create(
model="system.ai.claude-sonnet-4-5",
messages=[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
max_tokens=256
)

To query foundation models outside of your workspace, you must use the OpenAI client directly. You also need your Databricks workspace instance to connect the OpenAI client to Databricks. The following example assumes you have a Databricks API token and openai installed on your compute.

Python

import os
import openai
from openai import OpenAI

client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/ai-gateway/mlflow/v1"
)

response = client.chat.completions.create(
model="system.ai.claude-sonnet-4-5",
messages=[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
max_tokens=256
)

As an example, the following is the expected request format for a chat model when using the REST API. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.

Bash
{
"messages": [
{
"role": "user",
"content": "What is a mixture of experts model?"
}
],
"max_tokens": 100,
"temperature": 0.1
}

The following is an expected response format for a request made using the REST API:

JSON
{
"model": "databricks-claude-sonnet-4-5",
"choices": [
{
"message": {},
"index": 0,
"finish_reason": null
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 74,
"total_tokens": 81
},
"object": "chat.completion",
"id": null,
"created": 1698824353
}

Supported models

See Foundation model types for supported chat models.

Additional resources