Use `ai_query`

ai_query is a general-purpose AI Function that lets you query any supported AI model directly from SQL or Python. Unlike task-specific AI Functions, which are purpose-built and optimized for a single task, ai_query gives you full control over the model, prompt, and parameters.

For full syntax and parameter reference, see ai_query function.

AI Gateway integration

If your account has the Unity AI Gateway Beta preview enabled, ai_query requests to Databricks-provided endpoints are automatically routed through Unity AI Gateway. This enables usage tracking for your batch inference workloads. See Query endpoints with ai_query for details.

Requirements

This function is not available on Pro or Classic SQL warehouses.
Notebooks and workflows: Serverless compute is required. ai_query is not supported on classic compute clusters.
Databricks Runtime 18.2 or above is required.

When to use `ai_query`

Databricks recommends starting with a task-specific AI Function when one matches your objective. Use ai_query when a task-specific function doesn't meet your needs. For example, when you need to:

Control the prompt, model parameters, or output format more precisely
Query a custom, fine-tuned, or external model
Need flexibility to further optimize for throughput or quality

Decision tree for task-specific AI functions and ai_query

Best practices

Use Databricks-hosted models. Use Databricks-hosted foundation model endpoints (prefixed with databricks-), instead of provisioned throughput endpoints. These endpoints are fully managed and scale automatically without provisioning or configuration.
Select a model optimized for batch inference. Databricks optimizes specific models for high-throughput batch workloads. Using a non-optimized model can result in reduced throughput and longer job completion times. See Supported models for the full list of batch-optimized models.
Submit your full dataset in a single query. AI Functions automatically handle parallelization, retries, and scaling. Manually splitting data into small batches can reduce throughput.
Set failOnError to false for large workloads. This allows the job to complete and return error messages for failed rows, so you retain successful results without reprocessing the entire dataset.

Supported models

ai_query supports Databricks-hosted models, provisioned throughput models, custom models, and external models.

The following table summarizes the supported model types, the associated models, and model serving endpoint configuration requirements for each.

Type	Description	Supported models	Requirements
Databricks-hosted models	Databricks hosts these foundation models and offers preconfigured endpoints that you can query using `ai_query`. See Supported foundation models on Model Serving for which models are supported for each Model Serving feature and their region availability.	See Supported foundation models on Model Serving for the full list of supported foundation models on Model Serving. These models are supported and optimized for getting started with batch inference and production workflows: `databricks-gpt-5-2` `databricks-gpt-5-1` `databricks-gpt-5` `databricks-gpt-5-mini` `databricks-gpt-5-nano` `databricks-gpt-5-5` `databricks-gpt-5-4-mini` `databricks-gpt-5-4-nano` `databricks-gemini-3-1-pro` `databricks-gemini-3-pro` `databricks-gemini-3-flash` `databricks-gemini-3-5-flash` `databricks-gemini-3-1-flash-lite` `databricks-gemini-2-5-pro` `databricks-gemini-2-5-flash` `databricks-qwen35-122b-a10b` `databricks-qwen3-next-80b-a3b-instruct` `databricks-claude-opus-4-8` `databricks-claude-opus-4-7` `databricks-claude-opus-4-6` `databricks-claude-sonnet-4-6` `databricks-claude-sonnet-4` `databricks-gpt-oss-20b` `databricks-gpt-oss-120b` `databricks-gemma-3-12b` `databricks-llama-4-maverick` `databricks-meta-llama-3-3-70b-instruct` `databricks-meta-llama-3-1-8b-instruct` `databricks-qwen3-embedding-0-6b` `databricks-gte-large-en` Other Databricks-hosted models are available for use with AI Functions, but are not recommended for batch inference production workflows at scale. These other models are made available for real-time inference using Foundation Model APIs pay-per-token.	Databricks Runtime 15.4 LTS or above is required to use this functionality. Requires no endpoint provisioning or configuration. Your use of these models is subject to the Applicable model terms and AI Functions region availability.
Provisioned throughput models	AI Functions works with provisioned throughput models deployed on Model Serving.	Fine-tuned foundation models deployed on Model Serving Provisioned throughput models deployed on Model Serving	For fine-tuned foundation models, you must create a provisioned throughput endpoint in Model Serving. See Use `ai_query` with foundation models for notebook examples. AI Functions does not use the compute provisioned to the endpoint. AI Functions fully manages the scaling used for batch inference.
Custom models and external models	You can bring your own custom or external models and query them using AI Functions. AI Functions offers flexibility so you can query models for real-time inference or batch inference scenarios.	Foundation models hosted outside of Databricks. These models are made available using external models. See Access foundation models hosted outside of Databricks for a list of supported external models. Custom traditional ML and DL models	For external models, you must create an external model serving endpoint. For custom traditional ML and DL models, you must create a custom model serving endpoint. See Use `ai_query` with traditional ML models for a notebook example.

Type

Description

Supported models

Requirements

Databricks-hosted models

Databricks hosts these foundation models and offers preconfigured endpoints that you can query using ai_query. See Supported foundation models on Model Serving for which models are supported for each Model Serving feature and their region availability.

See Supported foundation models on Model Serving for the full list of supported foundation models on Model Serving. These models are supported and optimized for getting started with batch inference and production workflows:

databricks-gpt-5-2
databricks-gpt-5-1
databricks-gpt-5
databricks-gpt-5-mini
databricks-gpt-5-nano
databricks-gpt-5-5
databricks-gpt-5-4-mini
databricks-gpt-5-4-nano
databricks-gemini-3-1-pro
databricks-gemini-3-pro
databricks-gemini-3-flash
databricks-gemini-3-5-flash
databricks-gemini-3-1-flash-lite
databricks-gemini-2-5-pro
databricks-gemini-2-5-flash
databricks-qwen35-122b-a10b
databricks-qwen3-next-80b-a3b-instruct
databricks-claude-opus-4-8
databricks-claude-opus-4-7
databricks-claude-opus-4-6
databricks-claude-sonnet-4-6
databricks-claude-sonnet-4
databricks-gpt-oss-20b
databricks-gpt-oss-120b
databricks-gemma-3-12b
databricks-llama-4-maverick
databricks-meta-llama-3-3-70b-instruct
databricks-meta-llama-3-1-8b-instruct
databricks-qwen3-embedding-0-6b
databricks-gte-large-en

Other Databricks-hosted models are available for use with AI Functions, but are not recommended for batch inference production workflows at scale. These other models are made available for real-time inference using Foundation Model APIs pay-per-token.

Databricks Runtime 15.4 LTS or above is required to use this functionality. Requires no endpoint provisioning or configuration. Your use of these models is subject to the Applicable model terms and AI Functions region availability.

Provisioned throughput models

AI Functions works with provisioned throughput models deployed on Model Serving.

Fine-tuned foundation models deployed on Model Serving
Provisioned throughput models deployed on Model Serving

For fine-tuned foundation models, you must create a provisioned throughput endpoint in Model Serving. See Use ai_query with foundation models for notebook examples.
AI Functions does not use the compute provisioned to the endpoint. AI Functions fully manages the scaling used for batch inference.

Custom models and external models

You can bring your own custom or external models and query them using AI Functions. AI Functions offers flexibility so you can query models for real-time inference or batch inference scenarios.

Foundation models hosted outside of Databricks. These models are made available using external models. See Access foundation models hosted outside of Databricks for a list of supported external models.
Custom traditional ML and DL models

For external models, you must create an external model serving endpoint.
For custom traditional ML and DL models, you must create a custom model serving endpoint. See Use ai_query with traditional ML models for a notebook example.

Use `ai_query` with foundation models

The following example demonstrates how to use ai_query with a foundation model hosted by Databricks.

See ai_query function for syntax details and parameters.
See Multimodal inputs for multimodal input query examples.
See Examples for advanced scenarios for guidance on how to configure parameters for advanced use cases such as:
- Handle errors using failOnError
- Structured outputs on Databricks for how to specify structured output for your query responses.

SQL
Python

SQL
SELECT text, ai_query(
    "databricks-gpt-oss-120b",
    "Summarize the given text comprehensively, covering key points and main ideas concisely while retaining relevant details and examples. Ensure clarity and accuracy without unnecessary repetition or omissions: " || text
) AS summary
FROM uc_catalog.schema.table;

Python
df_out = df.selectExpr(
  "ai_query('databricks-gpt-oss-120b', CONCAT('Please provide a summary of the following text: ', text), modelParameters => named_struct('max_tokens', 100, 'temperature', 0.7)) as summary"
)
df_out.write.mode("overwrite").saveAsTable('output_table')

Example notebook: Batch inference and structured data extraction

The following example notebook demonstrates how to perform basic structured data extraction using ai_query to transform raw, unstructured data into organized, usable information through automated extraction techniques. This notebook also shows how to leverage Agent Evaluation to evaluate the accuracy using ground truth data.

Batch inference and structured data extraction notebook

Open notebook in new tab Open in Databricks

Use `ai_query` with traditional ML models

ai_query supports traditional ML models, including fully custom ones. These models must be deployed on Model Serving endpoints. For syntax details and parameters, see ai_query function.

SQL
SELECT text, ai_query(
  endpoint => "spam-classification",
  request => named_struct(
    "timestamp", timestamp,
    "sender", from_number,
    "text", text),
  returnType => "BOOLEAN") AS is_spam
FROM catalog.schema.inbox_messages
LIMIT 10

Example notebook: Batch inference using BERT for named entity recognition

The following notebook shows a traditional ML model batch inference example using BERT.

Batch inference using BERT for named entity recognition notebook

Open notebook in new tab Open in Databricks

Requirements​

When to use ai_query​

Best practices​

Supported models​

Use ai_query with foundation models​

Example notebook: Batch inference and structured data extraction​

Batch inference and structured data extraction notebook

Use ai_query with traditional ML models​

Example notebook: Batch inference using BERT for named entity recognition​

Batch inference using BERT for named entity recognition notebook

Requirements

When to use `ai_query`

Best practices

Supported models

Use `ai_query` with foundation models

Example notebook: Batch inference and structured data extraction

Use `ai_query` with traditional ML models

Example notebook: Batch inference using BERT for named entity recognition