Query vision models
In this article, you learn how to write query requests for foundation models optimized for vision tasks, and send them to your model serving endpoint.
Mosaic AI Model Serving provides a unified API to understand and analyze images using a variety of foundation models, unlocking powerful multimodal capabilities. This functionality is available through select Databricks-hosted models as part of Foundation Model APIs and serving endpoints that serve external models.
Requirements
- See Requirements.
- Install the appropriate package to your cluster based on the querying client option you choose.
Query examples
- OpenAI client
- SQL
To use the OpenAI client, specify the model serving endpoint name as the model
input.
from openai import OpenAI
import base64
import httpx
client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)
# encode image
image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")
# OpenAI request
completion = client.chat.completions.create(
model="databricks-claude-3-7-sonnet",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "what's in this image?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
],
}
],
)
print(completion.choices[0].message.content)
The Chat Completions API supports multiple image inputs, allowing the model to analyze each image and synthesize information from all inputs to generate a response to the prompt.
from openai import OpenAI
import base64
import httpx
client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)
# Encode multiple images
image1_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image1_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")
image2_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image2_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")
# OpenAI request
completion = client.chat.completions.create(
model="databricks-claude-3-7-sonnet",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What are in these images? Is there any difference between them?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image1_data}"},
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image2_data}"},
},
],
}
],
)
print(completion.choices[0].message.content)
The following example uses the built-in SQL function, ai_query. This function is in Public Preview and the definition might change.
The following queries a foundation model supported by Databricks Foundation Model APIs for multimodal input using the AI Function ai_query()
.
> SELECT *, ai_query(
'databricks-llama-4-maverick',
'what is this image about?', files => content)
as output FROM READ_FILES("/Volumes/main/multimodal/unstructured/image.jpeg");
Supported models
See Foundation model types for supported vision models.
Input image requirements
Model | Supported formats | Multiple images per request | Image size limitations | Image resizing recommendations | Image quality considerations |
---|---|---|---|---|---|
|
| Up to 5 images for API requests
| File size limit: 10 MB total across all images per API request | N/A | N/A |
|
| Up to 5 images for API requests
| File size limit: 10 MB total across all images per API request | N/A | N/A |
|
| Up to 500 individual image inputs per request | File size limit: Up to 10 MB total payload size per request | N/A |
|
|
| Up to 500 individual image inputs per request | File size limit: Up to 10 MB total payload size per request | N/A |
|
|
| Up to 500 individual image inputs per request | File size limit: Up to 10 MB total payload size per request | N/A |
|
|
|
|
| For optimal performance, resize images before uploading if they are too large.
|
|
Image to token conversion
This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.
Each image in a request to a foundation model adds to your token usage. See the pricing calculator to estimate image pricing based on the token usage and model you are using.
Limitations of image understanding
This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.
The following are image understanding limitations for the supported Databricks-hosted foundation models:
Model | Limitations |
---|---|
| The following are the limits for Claude models on Databricks:
|