Query AI Gateway endpoints

Beta

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.

This page describes how to query AI Gateway (Beta) endpoints using supported APIs.

Requirements

AI Gateway (Beta) preview enabled for your account. See Manage Databricks previews.
A Databricks workspace in a AI Gateway (Beta) supported region.
Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.

Supported APIs and integrations

AI Gateway supports the following APIs and integrations:

Unified APIs: OpenAI-compatible interfaces to query models on Databricks. Seamlessly switch between models from different providers without changing how you query each model.
- MLflow Chat Completions API
- MLflow Embeddings API
Native APIs: Provider-specific interfaces to access the latest model and provider-specific features.
Coding agents: Integrate your coding agents with AI Gateway to add centralized governance and monitoring to your AI-assisted development workflows. See Integrate with coding agents.

Query endpoints with unified APIs

Unified APIs offer an OpenAI-compatible interface to query models on Databricks. Use unified APIs to seamlessly switch between models from different providers without changing your code.

MLflow Chat Completions API

Python
REST API

Python
from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<ai-gateway-url>/mlflow/v1"
)

chat_completion = client.chat.completions.create(
  messages=[
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What is Databricks?"},
  ],
  model="<ai-gateway-endpoint>",
  max_tokens=256
)

print(chat_completion.choices[0].message.content)

Bash
curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hello! How can I assist you today?"},
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<ai-gateway-url>/mlflow/v1/chat/completions

Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.

MLflow Embeddings API

Python
REST API

Python
from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<ai-gateway-url>/mlflow/v1"
)

embeddings = client.embeddings.create(
  input="What is Databricks?",
  model="<ai-gateway-endpoint>"
)

print(embeddings.data[0].embedding)

Bash
curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "input": "What is Databricks?"
  }' \
  https://<ai-gateway-url>/mlflow/v1/embeddings

Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.

Query endpoints with native APIs

Native APIs offer provider-specific interfaces to query models on Databricks. Use native APIs to access the latest provider-specific features.

OpenAI Responses API

Python
REST API

Python
from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<ai-gateway-url>/openai/v1"
)

response = client.responses.create(
  model="<ai-gateway-endpoint>",
  max_output_tokens=256,
  input=[
    {
      "role": "user",
      "content": [{"type": "input_text", "text": "Hello!"}]
    },
    {
      "role": "assistant",
      "content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
    },
    {
      "role": "user",
      "content": [{"type": "input_text", "text": "What is Databricks?"}]
    }
  ]
)

print(response.output)

Bash
curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_output_tokens": 256,
    "input": [
      {
        "role": "user",
        "content": [{"type": "input_text", "text": "Hello!"}]
      },
      {
        "role": "assistant",
        "content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
      },
      {
        "role": "user",
        "content": [{"type": "input_text", "text": "What is Databricks?"}]
      }
    ]
  }' \
  https://<ai-gateway-url>/openai/v1/responses

Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.

Anthropic Messages API

Python
REST API

Python
import anthropic
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = anthropic.Anthropic(
  api_key="unused",
  base_url="https://<ai-gateway-url>/anthropic",
  default_headers={
    "Authorization": f"Bearer {DATABRICKS_TOKEN}",
  },
)

message = client.messages.create(
  model="<ai-gateway-endpoint>",
  max_tokens=256,
  messages=[
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What is Databricks?"},
  ],
)

print(message.content[0].text)

Bash
curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hello! How can I assist you today?"},
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<ai-gateway-url>/anthropic/v1/messages

Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.

Gemini API

Python
REST API

Python
from google import genai
from google.genai import types
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = genai.Client(
  api_key="databricks",
  http_options=types.HttpOptions(
    base_url="https://<ai-gateway-url>/gemini",
    headers={
      "Authorization": f"Bearer {DATABRICKS_TOKEN}",
    },
  ),
)

response = client.models.generate_content(
  model="<ai-gateway-endpoint>",
  contents=[
    types.Content(
      role="user",
      parts=[types.Part(text="Hello!")],
    ),
    types.Content(
      role="model",
      parts=[types.Part(text="Hello! How can I assist you today?")],
    ),
    types.Content(
      role="user",
      parts=[types.Part(text="What is Databricks?")],
    ),
  ],
  config=types.GenerateContentConfig(
    max_output_tokens=256,
  ),
)

print(response.text)

Bash
curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "Hello!"}]
      },
      {
        "role": "model",
        "parts": [{"text": "Hello! How can I assist you today?"}]
      },
      {
        "role": "user",
        "parts": [{"text": "What is Databricks?"}]
      }
    ],
    "generationConfig": {
      "maxOutputTokens": 256
    }
  }' \
  https://<ai-gateway-url>/gemini/v1beta/models/<ai-gateway-endpoint>:generateContent

Replace <ai-gateway-url> with your AI Gateway URL and <ai-gateway-endpoint> with your AI Gateway endpoint name.

Requirements​

Supported APIs and integrations​

Query endpoints with unified APIs​

MLflow Chat Completions API​

MLflow Embeddings API​

Query endpoints with native APIs​

OpenAI Responses API​

Anthropic Messages API​

Gemini API​

Next steps​

Requirements

Supported APIs and integrations

Query endpoints with unified APIs

MLflow Chat Completions API

MLflow Embeddings API

Query endpoints with native APIs

OpenAI Responses API

Anthropic Messages API

Gemini API

Next steps