Get started querying LLMs on Databricks
This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.
This article describes how to get started querying LLMs with model services in Unity AI Gateway.
The easiest way to get started is by querying a system-provided model service in system.ai. System-provided model services are available to all account users by default, so you can start sending requests without any additional setup. See Discover foundation models.
You can also test out and chat with these models using the AI Playground. See Chat with LLMs and prototype generative AI apps using AI Playground.
Requirements
- A Databricks workspace in a Unity AI Gateway supported region.
- A Databricks personal access token to query model services using the OpenAI client.
As a security best practice for production scenarios, Databricks recommends that you use machine-to-machine OAuth tokens for authentication during production.
For testing and development, Databricks recommends using a personal access token belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
Get started using model services
The following example is meant to be run in a Databricks notebook. The code example queries a Databricks-hosted model that's served by the system-provided model service system.ai.claude-sonnet-4-5. For other available models, see Discover foundation models.
In this example, you use the OpenAI client to query the model by populating the model field with the fully qualified name of the model service you want to query. Use your personal access token to populate the DATABRICKS_TOKEN and your Databricks workspace instance to connect the OpenAI client to Databricks.
from openai import OpenAI
import os
DATABRICKS_TOKEN = os.environ.get("DATABRICKS_TOKEN")
client = OpenAI(
api_key=DATABRICKS_TOKEN, # your personal access token
base_url="https://<workspace-url>/ai-gateway/mlflow/v1", # your Databricks workspace instance
)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are an AI assistant",
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
model="system.ai.claude-sonnet-4-5",
max_tokens=256
)
print(chat_completion.choices[0].message.content)
If you encounter the following message ImportError: cannot import name 'OpenAI' from 'openai', upgrade your openai version using !pip install -U openai. After you install the package, run dbutils.library.restartPython().
Expected output:
{
"id": "xxxxxxxxxxxxx",
"object": "chat.completion",
"created": "xxxxxxxxx",
"model": "system.ai.claude-sonnet-4-5",
"choices": [
{
"index": 0,
"message":
{
"role": "assistant",
"content": "A Mixture of Experts (MoE) model is a machine learning technique that combines the predictions of multiple expert models to improve overall performance. Each expert model specializes in a specific subset of the data, and the MoE model uses a gating network to determine which expert to use for a given input."
},
"finish_reason": "stop"
}
],
"usage":
{
"prompt_tokens": 123,
"completion_tokens": 23,
"total_tokens": 146
}
}
Next steps
- Use the AI playground to try out different models in a familiar chat interface.
- Use model services.
- Discover foundation models.
- Create custom model services.
- Explore methods to monitor model quality and endpoint health.