Use foundation models

In this article, you learn how to write query requests for foundation models and send them to your model serving endpoint. You can query foundation models that are hosted by Databricks and foundation models hosted outside of Databricks.

For traditional ML or Python models query requests, see Query serving endpoints for custom models.

Mosaic AI Model Serving supports Foundation Models APIs and external models for accessing foundation models. Model Serving uses a unified OpenAI-compatible API and SDK for querying them. This makes it possible to experiment with and customize foundation models for production across supported clouds and providers.

Mosaic AI Model Serving provides the following options for sending scoring requests to endpoints that serve foundation models or external models:

Method	Details
OpenAI client	Query a model hosted by a Mosaic AI Model Serving endpoint using the OpenAI client. Specify the model serving endpoint name as the `model` input. Supported for chat, embeddings, and completions models made available by Foundation Model APIs or external models.
SQL function	Invoke model inference directly from SQL using the `ai_query` SQL function. See Example: Query a foundation model.
Serving UI	Select Query endpoint from the Serving endpoint page. Insert JSON format model input data and click Send Request. If the model has an input example logged, use Show Example to load it.
REST API	Call and query the model using the REST API. See POST /serving-endpoints/{name}/invocations for details. For scoring requests to endpoints serving multiple models, see Query individual models behind an endpoint.
MLflow Deployments SDK	Use MLflow Deployments SDK's predict() function to query the model.
Databricks Python SDK	Databricks Python SDK is a layer on top of the REST API. It handles low-level details, such as authentication, making it easier to interact with the models.

Requirements

A model serving endpoint.
A Databricks workspace in a supported region.
- Foundation Model APIs regions
- External models regions
To send a scoring request through the OpenAI client, REST API or MLflow Deployment SDK, you must have a Databricks API token.

important

As a security best practice for production scenarios, Databricks recommends that you use machine-to-machine OAuth tokens for authentication during production.

For testing and development, Databricks recommends using a personal access token belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.

Install packages

After you have selected a querying method, you must first install the appropriate package to your cluster.

OpenAI client
REST API
MLflow Deployments SDK
Databricks Python SDK

To use the OpenAI client, the databricks-sdk[openai] package needs to be installed on your cluster. Databricks SDK provides a wrapper for constructing the OpenAI client with authorization automatically configured to query generative AI models. Run the following in your notebook or your local terminal:

!pip install databricks-sdk[openai]>=0.35.0

The following is only required when installing the package on a Databricks Notebook

Python
dbutils.library.restartPython()

!pip install mlflow

The following is only required when installing the package on a Databricks Notebook

Python
dbutils.library.restartPython()

Text completion model query

OpenAI client
SQL
REST API
MLflow Deployments SDK
Databricks Python SDK

important

Querying text completion models made available using Foundation Model APIs pay-per-token using the OpenAI client is not supported. Only querying external models using the OpenAI client is supported as demonstrated in this section.

To use the OpenAI client, specify the model serving endpoint name as the model input. The following example queries the claude-2 completions model hosted by Anthropic using the OpenAI client. To use the OpenAI client, populate the model field with the name of the model serving endpoint that hosts the model you want to query.

This example uses a previously created endpoint, anthropic-completions-endpoint, configured for accessing external models from the Anthropic model provider. See how to create external model endpoints.

See Supported models for additional models you can query and their providers.

Python

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

completion = openai_client.completions.create(
model="anthropic-completions-endpoint",
prompt="what is databricks",
temperature=1.0
)
print(completion)

important

The following example uses the built-in SQL function, ai_query. This function is Public Preview and the definition might change.

SQL
SELECT ai_query(
    "<completions-model-endpoint>",
    "Can you explain AI in ten words?"
  )

The following is a completions request for querying a completions model made available using external models.

important

The following example uses REST API parameters for querying serving endpoints that serve external models. These parameters are in Public Preview and the definition might change. See POST /serving-endpoints/{name}/invocations.

Bash

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{"prompt": "What is a quoll?", "max_tokens": 64}' \
https://<workspace_host>.databricks.com/serving-endpoints/<completions-model-endpoint>/invocations

The following is a completions request for querying a completions model made available using external models.

important

The following example uses the predict() API from the MLflow Deployments SDK.

Python

import os
import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook

os.environ['DATABRICKS_HOST'] = "https://<workspace_host>.databricks.com"
os.environ['DATABRICKS_TOKEN'] = "dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

completions_response = client.predict(
    endpoint="<completions-model-endpoint>",
    inputs={
        "prompt": "What is the capital of France?",
        "temperature": 0.1,
        "max_tokens": 10,
        "n": 2
    }
)

# Print the response
print(completions_response)

TThe following is a completions request for querying a completions model made available using external models.

Python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="<completions-model-endpoint>",
    prompt="Write 3 reasons why you should train an AI model on domain specific data sets."
)
print(response.choices[0].text)

The following is the expected request format for a completions model. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.

Bash
{
  "prompt": "What is mlflow?",
  "max_tokens": 100,
  "temperature": 0.1,
  "stop": [
    "Human:"
  ],
  "n": 1,
  "stream": false,
  "extra_params":
  {
    "top_p": 0.9
  }
}

The following is the expected response format:

JSON
{
  "id": "cmpl-8FwDGc22M13XMnRuessZ15dG622BH",
  "object": "text_completion",
  "created": 1698809382,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": "MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for tracking experiments, managing and deploying models, and collaborating on projects. MLflow also supports various machine learning frameworks and languages, making it easier to work with different tools and environments. It is designed to help data scientists and machine learning engineers streamline their workflows and improve the reproducibility and scalability of their models.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 83,
    "total_tokens": 88
  }
}

Chat completion model query

The following are examples for querying a chat model. The example applies to querying a chat model made available using either of the Model Serving capabilities: Foundation Model APIs or External models.

For a batch inference example, see Perform batch LLM inference using AI Functions.

The following is a chat request for the Meta Llama 3.3 70B Instruct model made available by the Foundation Model APIs pay-per-token endpoint, databricks-meta-llama-3-3-70b-instruct in your workspace.

To use the OpenAI client, specify the model serving endpoint name as the model input.

Python

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

To query foundation models outside of your workspace, you must use the OpenAI client directly. You also need your Databricks workspace instance to connect the OpenAI client to Databricks. The following example assumes you have a Databricks API token and openai installed on your compute.

Python

import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

important

The following example uses the built-in SQL function, ai_query. This function is Public Preview and the definition might change.

The following is a chat request for meta-llama-3-3-70b-instruct made available by the Foundation Model APIs pay-per-token endpoint, databricks-meta-llama-3-3-70b-instruct in your workspace.

note

The ai_query() function does not support query endpoints that serve the DBRX or the DBRX Instruct model.

SQL
SELECT ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Can you explain AI in ten words?"
  )

important

The following example uses REST API parameters for querying serving endpoints that serve foundation models. These parameters are in Public Preview and the definition might change. See POST /serving-endpoints/{name}/invocations.

Bash
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": " What is a mixture of experts model?"
    }
  ]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-meta-llama-3-3-70b-instruct/invocations \

important

The following example uses the predict() API from the MLflow Deployments SDK.

Python

import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

chat_response = client.predict(
    endpoint="databricks-meta-llama-3-3-70b-instruct",
    inputs={
        "messages": [
            {
              "role": "user",
              "content": "Hello!"
            },
            {
              "role": "assistant",
              "content": "Hello! How can I assist you today?"
            },
            {
              "role": "user",
              "content": "What is a mixture of experts model??"
            }
        ],
        "temperature": 0.1,
        "max_tokens": 20
    }
)

This code must be run in a notebook in your workspace. See Use the Databricks SDK for Python from a Databricks notebook.

Python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-meta-llama-3-3-70b-instruct",
    messages=[
        ChatMessage(
            role=ChatMessageRole.SYSTEM, content="You are a helpful assistant."
        ),
        ChatMessage(
            role=ChatMessageRole.USER, content="What is a mixture of experts model?"
        ),
    ],
    max_tokens=128,
)
print(f"RESPONSE:\n{response.choices[0].message.content}")

To query a foundation model endpoint using LangChain, you can use the ChatDatabricks ChatModel class and specify the endpoint.

The following example uses the ChatDatabricks ChatModel class in LangChain to query the Foundation Model APIs pay-per-token endpoint, databricks-meta-llama-3-3-70b-instruct.

Bash
%pip install databricks-langchain

Python
from langchain_core.messages import HumanMessage, SystemMessage
from databricks_langchain import ChatDatabricks

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is a mixture of experts model?"),
]

llm = ChatDatabricks(endpoint_name="databricks-meta-llama-3-3-70b-instruct")
llm.invoke(messages)

As an example, the following is the expected request format for a chat model when using the REST API. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.

Bash
{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

The following is an expected response format for a request made using the REST API:

JSON
{
  "model": "databricks-meta-llama-3-3-70b-instruct",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

Reasoning models

Mosaic AI Model Serving provides a unified API to interact with reasoning models. Reasoning gives foundation models enhanced capabilities to tackle complex tasks. Some models also provide transparency by revealing their step-by-step thought process before delivering a final answer.

There are two types of models: reasoning-only and hybrid. Reasoning-only models like the OpenAI o series always use internal reasoning in their responses. Hybrid models, such as databricks-claude-3-7-sonnet, support both fast, instant replies and deeper reasoning when needed.

To enable reasoning in hybrid models, include the thinking parameter and set a budget_tokens value that controls how many tokens the model can use for internal thought. Higher budgets can improve quality for complex tasks, though usage above 32K may vary. budget_tokens must be less than max_tokens.

All reasoning models are accessed through the chat completions endpoint.

Python
from openai import OpenAI
import base64
import httpx

client = OpenAI(
  api_key=os.environ.get('YOUR_DATABRICKS_TOKEN'),
  base_url=os.environ.get('YOUR_DATABRICKS_BASE_URL')
  )

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

msg = response.choices[0].message
reasoning = msg.content[0]["summary"][0]["text"]
answer = msg.content[1]["text"]

print("Reasoning:", reasoning)
print("Answer:", answer)

The API response includes both thinking and text content blocks:

Python
ChatCompletionMessage(
    role="assistant",
    content=[
        {
            "type": "reasoning",
            "summary": [
                {
                    "type": "summary_text",
                    "text": ("The question is asking about the scientific explanation for why the sky appears blue... "),
                    "signature": ("EqoBCkgIARABGAIiQAhCWRmlaLuPiHaF357JzGmloqLqkeBm3cHG9NFTxKMyC/9bBdBInUsE3IZk6RxWge...")
                }
            ]
        },
        {
            "type": "text",
            "text": (
                "# Why the Sky Is Blue\n\n"
                "The sky appears blue because of a phenomenon called Rayleigh scattering. Here's how it works..."
            )
        }
    ],
    refusal=None,
    annotations=None,
    audio=None,
    function_call=None,
    tool_calls=None
)

Manage reasoning across multiple turns

This section is specific to the databricks-claude-3-7-sonnet model.

In multi-turn conversations, only the reasoning blocks associated with the last assistant turn or tool-use session are visible to the model and counted as input tokens.

If you don't want to pass reasoning tokens back to the model (for example, you don't need it to reason over its prior steps), you can omit the reasoning block entirely. For example:

Python
response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {"role": "user", "content": "Why is the sky blue?"},
        {"role": "assistant", "content": text_content},
        {"role": "user", "content": "Can you explain in a way that a 5-year-old child can understand?"}
    ],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

answer = response.choices[0].message.content[1]["text"]
print("Answer:", answer)

However, if you do need the model to reason over its previous reasoning process - for instance, if you're building experiences that surface its intermediate reasoning - you must include the full, unmodified assistant message, including the reasoning block from the previous turn. Here's how to continue a thread with the full assistant message:

Python
assistant_message = response.choices[0].message

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {"role": "user", "content": "Why is the sky blue?"},
        {"role": "assistant", "content": text_content},
        {"role": "user", "content": "Can you explain in a way that a 5-year-old child can understand?"},
        assistant_message,
        {"role": "user", "content": "Can you simplify the previous answer?"}
    ],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

answer = response.choices[0].message.content[1]["text"]
print("Answer:", answer)

How does a reasoning model work?

Reasoning models introduce special reasoning tokens in addition to the standard input and output tokens. These tokens let the model "think" through the prompt, breaking it down and considering different ways to respond. After this internal reasoning process, the model generates its final answer as visible output tokens. Some models, like databricks-claude-3-7-sonnet, display these reasoning tokens to users, while others, such as the OpenAI o series, discard them and do not expose them in the final output.

Supported models

Foundation Model API (Databricks-hosted)

databricks-claude-3-7-sonnet

External models

OpenAI models with reasoning capabilities
Anthropic Claude models with reasoning capabilities
Google Gemini models with reasoning capabilities

Vision models

Mosaic AI Model Serving provides a unified API to understand and analyze images using a variety of foundation models, unlocking powerful multimodal capabilities. This functionality is available through select Databricks-hosted models as part of Foundation Model APIs and serving endpoints that serve external models.

Code examples

Python

from openai import OpenAI
import base64
import httpx

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

# encode image
image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")

# OpenAI request
completion = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "what's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

The Chat Completions API supports multiple image inputs, allowing the model to analyze each image and synthesize information from all inputs to generate a response to the prompt.

Python

from openai import OpenAI
import base64
import httpx

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

# Encode multiple images

image1_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image1_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")

image2_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image2_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")

# OpenAI request

completion = client.chat.completions.create(
model="databricks-claude-3-7-sonnet",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What are in these images? Is there any difference between them?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image1_data}"},
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image2_data}"},
},
],
}
],
)

print(completion.choices[0].message.content)

Supported models

Foundation Model API (Databricks-hosted)

databricks-claude-3-7-sonnet

External Models

OpenAI GPT and o series models with vision capabilities
Anthropic Claude models with vision capabilities
Google Gemini models with vision capabilities
Other external foundation models with vision capabilities that are OpenAI API compatible are also supported.

Input image Requirements

This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.

Multiple images per request

Up to 20 images for Claude.ai
Up to 100 images for API requests
All provided images are processed in a request, which is useful for comparing or contrasting them.

Size limitations

Images larger than 8000x8000 px will be rejected.
If more than 20 images are submitted in one API request, the maximum allowed size per image is 2000 x 2000 px.

Image resizing recommendations

For optimal performance, resize images before uploading if they are too large.
If an image's long edge exceeds 1568 pixels or its size exceeds ~1,600 tokens, it will be automatically scaled down while preserving aspect ratio.
Very small images (under 200 pixels on any edge) may degrade performance.
To reduce latency, keep images within 1.15 megapixels and at most 1568 pixels in both dimensions.

Image quality considerations

Supported formats: JPEG, PNG, GIF, WebP.
Clarity: Avoid blurry or pixelated images.
Text in images:
- Ensure text is legible and not too small.
- Avoid cropping out key visual context just to enlarge the text.

Calculating costs

This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.

Each image in a request to foundation model adds to your token usage.

Token counts and estimates

If no resizing is needed, estimate tokens with:
tokens = (width px × height px) / 750

Approximate token counts for different image sizes:

Image Size	Tokens
200×200 px (0.04 MP)	~54
1000×1000 px (1 MP)	~1334
1092×1092 px (1.19 MP)	~1590

Limitations of image understanding

This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.

There are limitations to the advanced image understanding of the Claude model on Databricks:

People identification: Cannot identify or name people in images.
Accuracy: May misinterpret low-quality, rotated, or very small images (<200 px).
Spatial reasoning: Struggles with precise layouts, such as reading analog clocks or chess positions.
Counting: Provides approximate counts, but may be inaccurate for many small objects.
AI-generated images: Cannot reliably detect synthetic or fake images.
Inappropriate content: Blocks explicit or policy-violating images.
Healthcare: Not suited for complex medical scans (for example, CTs and MRIs). It's not a diagnostic tool.

Review all outputs carefully, especially for high-stakes use cases. Avoid using Claude for tasks requiring perfect precision or sensitive analysis without human oversight.

Embedding model query

The following is an embeddings request for the gte-large-en model made available by Foundation Model APIs. The example applies to querying an embedding model made available using either of the Model Serving capabilities: Foundation Model APIs or external models.

To use the OpenAI client, specify the model serving endpoint name as the model input.

Python

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.embeddings.create(
  model="databricks-gte-large-en",
  input="what is databricks"
)

To query foundation models outside your workspace, you must use the OpenAI client directly, as demonstrated below. The following example assumes you have a Databricks API token and openai installed on your compute. You also need your Databricks workspace instance to connect the OpenAI client to Databricks.

Python

import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.embeddings.create(
  model="databricks-gte-large-en",
  input="what is databricks"
)

important

The following example uses the built-in SQL function, ai_query. This function is Public Preview and the definition might change.

SQL

SELECT ai_query(
    "databricks-gte-large-en",
    "Can you explain AI in ten words?"
  )

important

The following example uses REST API parameters for querying serving endpoints that serve foundation models or external models. These parameters are in Public Preview and the definition might change. See POST /serving-endpoints/{name}/invocations.

Bash

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d  '{ "input": "Embed this sentence!"}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-gte-large-en/invocations

important

The following example uses the predict() API from the MLflow Deployments SDK.

Python

import mlflow.deployments

export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

embeddings_response = client.predict(
    endpoint="databricks-gte-large-en",
    inputs={
        "input": "Here is some text to embed"
    }
)

Python

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-gte-large-en",
    input="Embed this sentence!"
)
print(response.data[0].embedding)

To use a Databricks Foundation Model APIs model in LangChain as an embedding model, import the DatabricksEmbeddings class and specify the endpoint parameter as follows:

Bash
%pip install databricks-langchain

Python
from databricks_langchain import DatabricksEmbeddings

embeddings = DatabricksEmbeddings(endpoint="databricks-gte-large-en")
embeddings.embed_query("Can you explain AI in ten words?")

The following is the expected request format for an embeddings model. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.

Bash

{
  "input": [
    "embedding text"
  ]
}

The following is the expected response format:

JSON
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": []
    }
  ],
  "model": "text-embedding-ada-002-v2",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  }
}

Check whether embeddings are normalized

Use the following to check if the embeddings generated by your model are normalized.

Python

  import numpy as np

  def is_normalized(vector: list[float], tol=1e-3) -> bool:
      magnitude = np.linalg.norm(vector)
      return abs(magnitude - 1) < tol

Function calling

Databricks Function Calling is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs and serving endpoints that serve external models. For details, see Function calling on Databricks.

Structured outputs

Structured outputs is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs. For details, see Structured outputs on Databricks.

Chat with supported LLMs using AI Playground

You can interact with supported large language models using the AI Playground. The AI Playground is a chat-like environment where you can test, prompt, and compare LLMs from your Databricks workspace.

AI playground

Requirements​

Install packages​

Text completion model query​

Chat completion model query​

Reasoning models​

Manage reasoning across multiple turns​

How does a reasoning model work?​

Supported models​

Foundation Model API (Databricks-hosted)​

External models​

Vision models​

Code examples​

Supported models​

Foundation Model API (Databricks-hosted)​

External Models​

Input image Requirements​

Multiple images per request​

Size limitations​

Image resizing recommendations​

Image quality considerations​

Calculating costs​

Limitations of image understanding​

Embedding model query​

Check whether embeddings are normalized​

Function calling​

Structured outputs​

Chat with supported LLMs using AI Playground​

Additional resources​

Requirements

Install packages

Text completion model query

Chat completion model query

Reasoning models

Manage reasoning across multiple turns

How does a reasoning model work?

Supported models

Foundation Model API (Databricks-hosted)

External models

Vision models

Code examples

Supported models

Foundation Model API (Databricks-hosted)

External Models

Input image Requirements

Multiple images per request

Size limitations

Image resizing recommendations

Image quality considerations

Calculating costs

Limitations of image understanding

Embedding model query

Check whether embeddings are normalized

Function calling

Structured outputs

Chat with supported LLMs using AI Playground

Additional resources