Query AI Gateway endpoints
This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.
This page describes how to query AI Gateway (Beta) endpoints using supported APIs.
Requirements
- AI Gateway (Beta) preview enabled for your account. See Manage Databricks previews.
- A Databricks workspace in a AI Gateway (Beta) supported region.
- Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.
Supported APIs and integrations
AI Gateway supports the following APIs and integrations:
- Unified APIs: OpenAI-compatible interfaces to query models on Databricks. Seamlessly switch between models from different providers without changing how you query each model.
- Native APIs: Provider-specific interfaces to access the latest model and provider-specific features.
- Coding agents: Integrate your coding agents with AI Gateway to add centralized governance and monitoring to your AI-assisted development workflows. See Integrate with coding agents.
Query endpoints with unified APIs
Unified APIs offer an OpenAI-compatible interface to query models on Databricks. Use unified APIs to seamlessly switch between models from different providers without changing your code.
MLflow Chat Completions API
- Python
- REST API
from openai import OpenAI
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = OpenAI(
api_key=DATABRICKS_TOKEN,
base_url="https://<ai-gateway-url>/mlflow/v1"
)
chat_completion = client.chat.completions.create(
messages=[
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"},
],
model="<ai-gateway-endpoint>",
max_tokens=256
)
print(chat_completion.choices[0].message.content)
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "<ai-gateway-endpoint>",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"}
]
}' \
https://<ai-gateway-url>/mlflow/v1/chat/completions
Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.
MLflow Embeddings API
- Python
- REST API
from openai import OpenAI
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = OpenAI(
api_key=DATABRICKS_TOKEN,
base_url="https://<ai-gateway-url>/mlflow/v1"
)
embeddings = client.embeddings.create(
input="What is Databricks?",
model="<ai-gateway-endpoint>"
)
print(embeddings.data[0].embedding)
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "<ai-gateway-endpoint>",
"input": "What is Databricks?"
}' \
https://<ai-gateway-url>/mlflow/v1/embeddings
Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.
Query endpoints with native APIs
Native APIs offer provider-specific interfaces to query models on Databricks. Use native APIs to access the latest provider-specific features.
OpenAI Responses API
- Python
- REST API
from openai import OpenAI
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = OpenAI(
api_key=DATABRICKS_TOKEN,
base_url="https://<ai-gateway-url>/openai/v1"
)
response = client.responses.create(
model="<ai-gateway-endpoint>",
max_output_tokens=256,
input=[
{
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
},
{
"role": "assistant",
"content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
},
{
"role": "user",
"content": [{"type": "input_text", "text": "What is Databricks?"}]
}
]
)
print(response.output)
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "<ai-gateway-endpoint>",
"max_output_tokens": 256,
"input": [
{
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
},
{
"role": "assistant",
"content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
},
{
"role": "user",
"content": [{"type": "input_text", "text": "What is Databricks?"}]
}
]
}' \
https://<ai-gateway-url>/openai/v1/responses
Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.
Anthropic Messages API
- Python
- REST API
import anthropic
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = anthropic.Anthropic(
api_key="unused",
base_url="https://<ai-gateway-url>/anthropic",
default_headers={
"Authorization": f"Bearer {DATABRICKS_TOKEN}",
},
)
message = client.messages.create(
model="<ai-gateway-endpoint>",
max_tokens=256,
messages=[
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"},
],
)
print(message.content[0].text)
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "<ai-gateway-endpoint>",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"}
]
}' \
https://<ai-gateway-url>/anthropic/v1/messages
Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.
Gemini API
- Python
- REST API
from google import genai
from google.genai import types
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = genai.Client(
api_key="databricks",
http_options=types.HttpOptions(
base_url="https://<ai-gateway-url>/gemini",
headers={
"Authorization": f"Bearer {DATABRICKS_TOKEN}",
},
),
)
response = client.models.generate_content(
model="<ai-gateway-endpoint>",
contents=[
types.Content(
role="user",
parts=[types.Part(text="Hello!")],
),
types.Content(
role="model",
parts=[types.Part(text="Hello! How can I assist you today?")],
),
types.Content(
role="user",
parts=[types.Part(text="What is Databricks?")],
),
],
config=types.GenerateContentConfig(
max_output_tokens=256,
),
)
print(response.text)
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"role": "user",
"parts": [{"text": "Hello!"}]
},
{
"role": "model",
"parts": [{"text": "Hello! How can I assist you today?"}]
},
{
"role": "user",
"parts": [{"text": "What is Databricks?"}]
}
],
"generationConfig": {
"maxOutputTokens": 256
}
}' \
https://<ai-gateway-url>/gemini/v1beta/models/<ai-gateway-endpoint>:generateContent
Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.