Get started querying LLMs on Databricks
This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.
This article describes how to get started querying LLMs with model services in Unity AI Gateway.
The easiest way to get started is by querying a system-provided model service in system.ai. System-provided model services are available to all account users by default, so you can start sending requests without any additional setup. See Model services in Unity Catalog.
You can also test out and chat with these models using the AI Playground. See Chat with LLMs and prototype generative AI apps using AI Playground.
Requirements
- A Databricks workspace in a Unity AI Gateway supported region.
- A Databricks personal access token to query model services using the OpenAI client.
Get started using model services
The following example is meant to be run in a Databricks notebook. The code example queries a Databricks-hosted model that's served by the system-provided model service system.ai.claude-sonnet-4-5. For other available models, see Model services in Unity Catalog.
In this example, you use the OpenAI client to query the model by populating the model field with the fully qualified name of the model service you want to query. Use your personal access token to populate the DATABRICKS_TOKEN and your Databricks workspace instance to connect the OpenAI client to Databricks.
from openai import OpenAI
import os
DATABRICKS_TOKEN = os.environ.get("DATABRICKS_TOKEN")
client = OpenAI(
api_key=DATABRICKS_TOKEN, # your personal access token
base_url="https://<workspace-url>/ai-gateway/mlflow/v1", # your Databricks workspace instance
)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are an AI assistant",
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
model="system.ai.claude-sonnet-4-5",
max_tokens=256
)
print(chat_completion.choices[0].message.content)
If you encounter the following message ImportError: cannot import name 'OpenAI' from 'openai', upgrade your openai version using !pip install -U openai. After you install the package, run dbutils.library.restartPython().
Expected output:
{
"id": "xxxxxxxxxxxxx",
"object": "chat.completion",
"created": "xxxxxxxxx",
"model": "system.ai.claude-sonnet-4-5",
"choices": [
{
"index": 0,
"message":
{
"role": "assistant",
"content": "A Mixture of Experts (MoE) model is a machine learning technique that combines the predictions of multiple expert models to improve overall performance. Each expert model specializes in a specific subset of the data, and the MoE model uses a gating network to determine which expert to use for a given input."
},
"finish_reason": "stop"
}
],
"usage":
{
"prompt_tokens": 123,
"completion_tokens": 23,
"total_tokens": 146
}
}
Next steps
- Use the AI playground to try out different models in a familiar chat interface.
- Query model services.
- Discover and govern access to model services.
- Create and manage model services.
- Explore methods to monitor model quality and endpoint health.