Skip to main content

Get started querying LLMs on Databricks

Beta

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.

This article describes how to get started querying LLMs with model services in Unity AI Gateway.

The easiest way to get started is by querying a system-provided model service in system.ai. System-provided model services are available to all account users by default, so you can start sending requests without any additional setup. See Discover foundation models.

You can also test out and chat with these models using the AI Playground. See Chat with LLMs and prototype generative AI apps using AI Playground.

Requirements

important

As a security best practice for production scenarios, Databricks recommends that you use machine-to-machine OAuth tokens for authentication during production.

For testing and development, Databricks recommends using a personal access token belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.

Get started using model services

The following example is meant to be run in a Databricks notebook. The code example queries a Databricks-hosted model that's served by the system-provided model service system.ai.claude-sonnet-4-5. For other available models, see Discover foundation models.

In this example, you use the OpenAI client to query the model by populating the model field with the fully qualified name of the model service you want to query. Use your personal access token to populate the DATABRICKS_TOKEN and your Databricks workspace instance to connect the OpenAI client to Databricks.

Python
from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get("DATABRICKS_TOKEN")

client = OpenAI(
api_key=DATABRICKS_TOKEN, # your personal access token
base_url="https://<workspace-url>/ai-gateway/mlflow/v1", # your Databricks workspace instance
)

chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are an AI assistant",
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
model="system.ai.claude-sonnet-4-5",
max_tokens=256
)

print(chat_completion.choices[0].message.content)
note

If you encounter the following message ImportError: cannot import name 'OpenAI' from 'openai', upgrade your openai version using !pip install -U openai. After you install the package, run dbutils.library.restartPython().

Expected output:

Bash

{
"id": "xxxxxxxxxxxxx",
"object": "chat.completion",
"created": "xxxxxxxxx",
"model": "system.ai.claude-sonnet-4-5",
"choices": [
{
"index": 0,
"message":
{
"role": "assistant",
"content": "A Mixture of Experts (MoE) model is a machine learning technique that combines the predictions of multiple expert models to improve overall performance. Each expert model specializes in a specific subset of the data, and the MoE model uses a gating network to determine which expert to use for a given input."
},
"finish_reason": "stop"
}
],
"usage":
{
"prompt_tokens": 123,
"completion_tokens": 23,
"total_tokens": 146
}
}

Next steps