Query vision models

In this article, you learn how to write query requests for foundation models optimized for vision tasks, and send them to your model serving endpoint.

Mosaic AI Model Serving provides a unified API to understand and analyze images using a variety of foundation models, unlocking powerful multimodal capabilities. This functionality is available through select Databricks-hosted models as part of Foundation Model APIs and serving endpoints that serve external models.

Requirements

See Requirements.
Install the appropriate package to your cluster based on the querying client option you choose.

Query examples

OpenAI client
SQL

To use the OpenAI client, specify the model serving endpoint name as the model input.

Python

from openai import OpenAI
import base64
import httpx

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

# encode image
image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")

# OpenAI request
completion = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "what's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

The Chat Completions API supports multiple image inputs, allowing the model to analyze each image and synthesize information from all inputs to generate a response to the prompt.

Python

from openai import OpenAI
import base64
import httpx

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

# Encode multiple images

image1_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image1_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")

image2_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image2_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")

# OpenAI request

completion = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What are in these images? Is there any difference between them?"},
            {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image1_data}"},
            },
            {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image2_data}"},
            },
          ],
      }
  ],
)

print(completion.choices[0].message.content)

important

The following example uses the built-in SQL function, ai_query. This function is in Public Preview and the definition might change.

The following queries a foundation model supported by Databricks Foundation Model APIs for multimodal input using the AI Function ai_query().

SQL

> SELECT *, ai_query(
  'databricks-llama-4-maverick',
 'what is this image about?', files => content)
as output FROM READ_FILES("/Volumes/main/multimodal/unstructured/image.jpeg");

Supported models

See Foundation model types for supported vision models.

Input image requirements

Model	Supported formats	Multiple images per request	Image size limitations	Image resizing recommendations	Image quality considerations
`databricks-gpt-5-1`	`JPEG` `PNG` `WebP` `GIF` (Non-animated `GIF`)	Up to 500 individual image inputs per request	File size limit: Up to 10 MB total payload size per request	N/A	No watermarks or logos Clear enough for a human to understand
`databricks-gpt-5`	`JPEG` `PNG` `WebP` `GIF` (Non-animated `GIF`)	Up to 500 individual image inputs per request	File size limit: Up to 10 MB total payload size per request	N/A	No watermarks or logos Clear enough for a human to understand
`databricks-gpt-5-mini`	`JPEG` `PNG` `WebP` `GIF` (Non-animated `GIF`)	Up to 500 individual image inputs per request	File size limit: Up to 10 MB total payload size per request	N/A	No watermarks or logos Clear enough for a human to understand
`databricks-gpt-5-nano`	`JPEG` `PNG` `WebP` `GIF` (Non-animated `GIF`)	Up to 500 individual image inputs per request	File size limit: Up to 10 MB total payload size per request	N/A	No watermarks or logos Clear enough for a human to understand
`databricks-gemini-3-pro`	`JPEG` `PNG` `WebP`	Up to 50 images for API requests. All provided images are processed in a request.	File size limit: 7 MB each image	N/A	N/A
`databricks-gemini-2.5-pro`	`JPEG` `PNG` `WebP`	Up to 50 images for API requests. All provided images are processed in a request.	File size limit: 7 MB each image	N/A	N/A
`databricks-gemini-2.5-flash`	`JPEG` `PNG` `WebP`	Up to 50 images for API requests. All provided images are processed in a request.	File size limit: 7 MB each image	N/A	N/A
`databricks-gemma-3-12b`	`JPEG` `PNG` `WebP` `GIF`	Up to 5 images for API requests All provided images are processed in a request.	File size limit: 10 MB total across all images per API request	N/A	N/A
`databricks-llama-4-maverick`	`JPEG` `PNG` `WebP` `GIF`	Up to 5 images for API requests All provided images are processed in a request.	File size limit: 10 MB total across all images per API request	N/A	N/A
`databricks-claude-sonnet-4-5` `databricks-claude-opus-4-1` `databricks-claude-sonnet-4` `databricks-claude-3-7-sonnet`	`JPEG` `PNG` `GIF` `WebP`	Up to 20 images for Claude.ai Up to 100 images for API requests All provided images are processed in a request, which is useful for comparing or contrasting them.	Images larger than 8000x8000 px are rejected. If more than 20 images are submitted in one API request, the maximum allowed size per image is 2000 x 2000 px.	For optimal performance, resize images before uploading if they are too large. If an image's long edge exceeds 1568 pixels or its size exceeds ~1,600 tokens, it is automatically scaled down while preserving aspect ratio. Very small images (under 200 pixels on any edge) may degrade performance. To reduce latency, keep images within 1.15 megapixels and at most 1568 pixels in both dimensions.	Clarity: Avoid blurry or pixelated images. Text in images: Ensure text is legible and not too smal. Avoid cropping out key visual context just to enlarge the text.

Image to token conversion

This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.

Each image in a request to a foundation model adds to your token usage. See the pricing calculator to estimate image pricing based on the token usage and model you are using.

Limitations of image understanding

This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.

The following are image understanding limitations for the supported Databricks-hosted foundation models:

Model	Limitations
The following Claude models are supported: `databricks-claude-sonnet-4-5` `databricks-claude-opus-4-1` `databricks-claude-sonnet-4` `databricks-claude-3-7-sonnet`	The following are the limits for Claude models on Databricks: Avoid using Claude for tasks requiring perfect precision or sensitive analysis without human oversight. People identification: Cannot identify or name people in images. Accuracy: May misinterpret low-quality, rotated, or very small images (200 px). Spatial reasoning: Struggles with precise layouts, such as reading analog clocks or chess positions. Counting: Provides approximate counts, but may be inaccurate for many small objects. AI-generated images: Cannot reliably detect synthetic or fake images. Inappropriate content: Blocks explicit or policy-violating images. Healthcare: Not suited for complex medical scans (for example, CTs and MRIs). It's not a diagnostic tool.

Model

Limitations

The following Claude models are supported:

databricks-claude-sonnet-4-5
databricks-claude-opus-4-1
databricks-claude-sonnet-4
databricks-claude-3-7-sonnet

The following are the limits for Claude models on Databricks:

Avoid using Claude for tasks requiring perfect precision or sensitive analysis without human oversight.
People identification: Cannot identify or name people in images.
Accuracy: May misinterpret low-quality, rotated, or very small images (200 px).
Spatial reasoning: Struggles with precise layouts, such as reading analog clocks or chess positions.
Counting: Provides approximate counts, but may be inaccurate for many small objects.
AI-generated images: Cannot reliably detect synthetic or fake images.
Inappropriate content: Blocks explicit or policy-violating images.
Healthcare: Not suited for complex medical scans (for example, CTs and MRIs). It's not a diagnostic tool.

Requirements​

Query examples​

Supported models​

Input image requirements​

Image to token conversion​

Limitations of image understanding​

Additional resources​