Databricks-hosted foundation models available in Foundation Model APIs

This article describes the state-of-the-art open models that are supported by Databricks Foundation Model APIs.

note

See Supported foundation models on Mosaic AI Model Serving for region availability of these models and the supported feature areas.

You can send query requests to these models using the pay-per-token endpoints available in your Databricks workspace. See Use foundation models and pay-per-token supported models table for the names of the model endpoints to use.

In addition to supporting models in pay-per-token mode, Foundation Model APIs also offers provisioned throughput mode. Databricks recommends provisioned throughput for production workloads. This mode supports all models of a model architecture family, including the fine-tuned and custom pre-trained models supported in pay-per-token mode. See Provisioned throughput Foundation Model APIs for the list of supported architectures.

You can interact with these supported models using the AI Playground.

OpenAI GPT-5.3 Codex

important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy.

note

This model is not supported in AI Playground. Use the Responses API to interact with this model.

Endpoint name: databricks-gpt-5-3-codex

GPT-5.3 Codex is OpenAI’s most advanced agentic coding model, designed to handle complex, long-running tasks that involve research, tool use, and execution. It combines the frontier coding performance of GPT-5.2 Codex with the reasoning and professional knowledge of GPT-5.2, while operating 25% faster. The model supports multimodal inputs and features a 400K total token context window with 128K maximum output tokens.

As with other large language models, GPT-5.3 Codex output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

OpenAI GPT-5.2 Codex

important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy.

note

This model is not supported in AI Playground. Use the Responses API to interact with this model.

Endpoint name: databricks-gpt-5-2-codex

GPT-5.2 Codex is a code-specialized large language model built on GPT-5.2 architecture with enhanced coding capabilities, it excels at code generation, refactoring, debugging, and software engineering tasks. The model supports multimodal inputs and features a 400K total token context window with 128K maximum output tokens.

As with other large language models, GPT-5.2 Codex output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

OpenAI GPT-5.2

important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy.

Endpoint name: databricks-gpt-5-2

GPT-5.2 is a general purpose large language model with reasoning capabilities developed by OpenAI. This model builds directly upon GPT-5.1, offering higher accuracy, improved token efficiency on medium-to-complex tasks, and more deliberate scaffolded reasoning. This model excels at structured extraction, multi-step workflows, and multimodal tasks. It supports multimodal inputs and features a 400K total token context window with 128K maximum output tokens.

As with other large language models, GPT-5.2 output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

OpenAI GPT-5.1

important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy.

Endpoint name: databricks-gpt-5-1

GPT-5.1 is a general purpose large language model with reasoning capabilities developed by OpenAI. This model features both Instant and Thinking modes for fast conversation or deep reasoning, automatically adjusting for simple or complex tasks. The model excels at content creation, tutoring, technical support, and coding, with less reliance on strict prompt engineering than prior versions. It supports multimodal inputs and features a 400K total token context window with 128K maximum output tokens. Learn more about GPT-5.1.

As with other large language models, GPT-5.1 output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

OpenAI GPT-5.1 Codex Max

important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy.

note

This model is not supported in AI Playground. You can use the Responses API to interact with this model.

Endpoint name: databricks-gpt-5-1-codex-max

GPT-5.1 Codex Max is OpenAI's high-performance code-specialized large language model. Built on GPT-5.1 architecture with maximum coding performance, it excels at complex code generation, large-scale refactoring, and enterprise software engineering tasks. It supports multimodal inputs and features a 400K total token context window with 128K maximum output tokens.

As with other large language models, GPT-5.1 Codex Max output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

OpenAI GPT-5.1 Codex Mini

important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy.

note

This model is not supported in AI Playground. You can use the Responses API to interact with this model.

Endpoint name: databricks-gpt-5-1-codex-mini

GPT-5.1 Codex Mini is OpenAI's cost-optimized code-specialized large language model. Built on GPT-5.1 architecture with efficient coding capabilities, it excels at code completion, simple refactoring, and everyday coding tasks. It supports multimodal inputs and features a 400K total token context window with 128K maximum output tokens.

As with other large language models, GPT-5.1 Codex Mini output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

OpenAI GPT-5

important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy.

Endpoint name: databricks-gpt-5

GPT-5 is a state-of-the-art, general purpose large language model and reasoning model built and trained by OpenAI. It supports multimodal inputs and features a 400K total token context window with 128K maximum output tokens. The model is built for coding, chat, reasoning and agent-driven tasks.

As with other large language models, GPT-5 output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

OpenAI GPT-5 mini

important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy.

Endpoint name: databricks-gpt-5-mini

GPT-5 mini is a state-of-the-art, general purpose large language model and reasoning model built and trained by OpenAI. It supports multimodal inputs and features a 400K total token context window with 128K maximum output tokens. The model is cost-optimized for reasoning and chat workloads and excels at well-defined tasks that require reliable reasoning, precise language, and rapid output for text and images.

This endpoint is hosted by Databricks within the Databricks security perimeter.

OpenAI GPT-5 nano

important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy.

Endpoint name: databricks-gpt-5-nano

GPT-5 nano is a state-of-the-art, general purpose large language model and reasoning model built and trained by OpenAI. It supports multimodal inputs and features a 400K total token context window with 128K maximum output tokens. The model excels at high-throughput tasks like simple instruction-following or classification for routine business processes or mobile applications.

As with other large language models, GPT-5 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

Google Gemini 3 Flash

important

See Applicable model developer licenses and terms for Gemini 3 Flash.

This model is hosted on a global endpoint and requires cross geography routing to be enabled.

Endpoint name: databricks-gemini-3-flash

Gemini 3 Flash is a high-speed, cost-efficient multimodal AI model developed and trained by Google. This model offers speed and scale without compromising quality, featuring advanced multimodal capabilities for complex video analysis, data extraction, and visual Q&As in near real-time. Gemini 3 Flash delivers better price performance and faster speeds, enabling production-scale deployments. Learn more about Gemini 3 Flash.

As with other large language models, Gemini 3 Flash output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Google Gemini 3.1 Pro Preview

important

See Applicable model developer licenses and terms for Gemini 3.1 Pro Preview.

This model is hosted on a global endpoint and requires cross geography routing to be enabled.

Endpoint name: databricks-gemini-3-1-pro

Gemini 3.1 Pro Preview is a state-of-the-art hybrid reasoning model with a 1-million-token context window developed and trained by Google. Compared to Gemini 3 Pro, Gemini 3.1 Pro delivers stronger reasoning and document intelligence, making it an overall smarter model for complex workflows and tasks. It excels at complex reasoning, deep analysis, and multimodal understanding across a wide range of inputs and tasks

As with other large language models, Gemini 3.1 Pro Preview output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Google Gemini 3 Pro Preview

important

See Applicable model developer licenses and terms for Gemini 3 Pro Preview.

This model is hosted on a global endpoint and requires cross geography routing to be enabled.

Endpoint name: databricks-gemini-3-pro

Gemini 3 Pro Preview is a state-of-the-art hybrid reasoning model with a 1-million-token context window developed and trained by Google. Gemini 3 Pro's advanced reasoning capabilities and built-in multimodal capabilities allow it to excel at complex reasoning, deep analysis, and multimodal understanding across a wide range of inputs and tasks.

As with other large language models, Gemini 3 Pro Preview output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Google Gemini 2.5 Pro

important

See Applicable model developer licenses and terms for the Gemini 2.5 Pro.

Endpoint name: databricks-gemini-2-5-pro

Gemini 2.5 Pro is a hybrid reasoning model with a 1-million-token context window developed and trained by Google. Gemini 2.5 Pro's "Deep Think Mode" and built-in audio output set it apart as a leading model for enterprise, research, and creative applications. It is designed to excel at complex reasoning, deep analysis, and multimodal understanding across a wide range of inputs and tasks. Learn more about Gemini 2.5 Pro.

As with other large language models, Gemini 2.5 Pro output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Google Gemini 2.5 Flash

important

See Applicable model developer licenses and terms for the Gemini 2.5 Flash.

Endpoint name: databricks-gemini-2-5-flash

Gemini 2.5 Flash is a high-speed, cost-efficient, multimodal AI model developed and trained by Google. It is Google's first fully hybrid reasoning model, designed for developers and enterprises seeking rapid, scalable, and affordable AI solutions. Gemini 2.5 Flash can process up to 1 million tokens in a single context, allowing it to handle extremely large documents or datasets. Gemini 2.5 Flash is optimized for real-time and high-volume applications like chatbots, data extraction, translation, and document parsing. Learn more about Gemini 2.5 Flash.

As with other large language models, Gemini 2.5 Flash output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Alibaba Cloud Qwen3-Embedding-0.6B

Preview

The Qwen3-Embedding-0.6B model is in Public Preview.

Endpoint name: databricks-qwen3-embedding-0-6b

Qwen3-Embedding-0.6B is a compact text embedding model with ~600M parameters, designed for semantic tasks such as retrieval, similarity search, clustering, and classification. It encodes text into dense vectors that represent meaning rather than surface form.

The model supports 100+ languages (including code) and handles long contexts up to ~32K tokens, making it suitable for embedding long documents. It produces embeddings with a configurable dimensionality up to 1024 and is instruction-aware, allowing task-specific biasing through prompts

Built on a transformer encoder and fine-tuned specifically for embedding generation, Qwen3-Embedding-0.6B balances embedding quality with efficient inference.

Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. Qwen3-Embedding-0.6B can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.

Alibaba Cloud Qwen3-Next 80B A3B Instruct

Endpoint name: databricks-qwen3-next-80b-a3b-instruct

Qwen3-Next-80B-A3B-Instruct is a highly efficient large language model optimized for instruction-following tasks built and trained by Alibaba Cloud. This model is designed to handle ultra-long contexts and excels at multi-step workflows, retrieval-augmented generation, and enterprise applications that require deterministic outputs at high throughput.

As with other large language models, Qwen3-Next 80B A3B Instruct output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

OpenAI GPT OSS 120B

Endpoint name: databricks-gpt-oss-120b

GPT OSS 120B is a state-of-the-art, reasoning model with chain-of-thought and adjustable reasoning effort levels built and trained by OpenAI. It is OpenAI's flagship open-weight model and features a 128K token context window. The model is built for high-quality reasoning tasks.

As with other large language models, GPT OSS 120B output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

OpenAI GPT OSS 20B

Endpoint name: databricks-gpt-oss-20b

GPT OSS 20B is a state-of-the-art, lightweight reasoning model built and trained by OpenAI. This model has a 128K token context window and excels at real-time copilots and batch inference tasks.

As with other large language models, GPT OSS 20B output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Google Gemma 3 12B

important

See Applicable model developer licenses and terms for the Gemma 3 Community License and Acceptable Use Policy.

Endpoint name: databricks-gemma-3-12b

Gemma 3 12B is a 12-billion parameter multimodal and vision language model developed by Google as part of the Gemma 3 family. Gemma 3 has up to a 128K token context and provides multilingual support for over 140 languages. This model is designed to handle both text and image inputs and generate text outputs, and is optimized for dialogue use cases, text generation and image understanding tasks, including question answering.

As with other large language models, Gemma 3 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 4 Maverick

important

See Applicable model developer licenses and terms for the Llama 4 Community License and Acceptable Use Policy.
Meta Llama 4 Maverick will be retired on March 9, 2026 for pay-per-token and on June 9, 2026 for provisioned throughput. See Retired models for the recommended replacement model and guidance for how to migrate during deprecation.

Endpoint name: databricks-llama-4-maverick

Llama 4 Maverick is a state-of-the-art large language model built and trained by Meta. It is the first of the Llama model family to use a mixture of experts architecture for compute efficiency. Llama 4 Maverick supports multiple languages and is optimized for precise image and text understanding use cases. Currently, Databricks support of Llama 4 Maverick is limited to text understanding use cases. Learn more about Llama 4 Maverick.

As with other large language models, Llama 4 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 3.3 70B Instruct

Endpoint name: databricks-meta-llama-3-3-70b-instruct

Meta-Llama-3.3-70B-Instruct is a state-of-the-art large language model with a context of 128,000 tokens that was built and trained by Meta. The model supports multiple languages and is optimized for dialogue use cases. Learn more about the Meta Llama 3.3.

Similar to other large language models, Llama-3's output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Anthropic Claude Haiku 4.5

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's usage policy.

Endpoint name: databricks-claude-haiku-4-5

Claude Haiku 4.5 is Anthropic's fastest and most cost-effective model, delivering near-frontier coding quality with exceptional speed and efficiency. It excels at real-time, low-latency applications including chat assistants, customer service agents, pair programming, and rapid prototyping. This model is ideal for cost-conscious production deployments and agentic systems requiring responsive AI assistance.

As with other large language models, Claude Haiku 4.5 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

Anthropic Claude Sonnet 4.6

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's usage policy.

Endpoint name: databricks-claude-sonnet-4-6

Claude Sonnet 4.6 is Anthropic's most advanced hybrid reasoning model. It offers two modes: near-instant responses and extended thinking for deeper reasoning based on the complexity of the task. Claude Sonnet 4.6 specializes in applications that require a balance of practical throughput and advanced thinking such as customer-facing agents, production coding workflows, and content generation at scale.

As with other large language models, Claude Sonnet 4.6 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

Anthropic Claude Sonnet 4.5

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's usage policy.

Endpoint name: databricks-claude-sonnet-4-5

Claude Sonnet 4.5 is Anthropic’s most advanced hybrid reasoning model. It offers two modes: near-instant responses and extended thinking for deeper reasoning based on the complexity of the task. Claude Sonnet 4.5 specializes in application that require a balance of practical throughput and advanced thinking such as customer-facing agents, production coding workflows, and content generation at scale.

As with other large language models, Claude Sonnet 4.5 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

Anthropic Claude Opus 4.6

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's usage policy.

Endpoint name: databricks-claude-opus-4-6

Claude Opus 4.6 is Anthropic's most capable hybrid reasoning model with adaptive thinking capabilities. This model introduces a new max effort level for the most demanding tasks, with high effort set as the default for optimal performance. Claude Opus 4.6 excels at complex reasoning, deep analysis, code generation, research, and sophisticated multi-step workflows. It features a 1 million token context window, making it ideal for enterprise applications that require both extensive analysis and comprehensive outputs.

As with other large language models, Claude Opus 4.6 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

Anthropic Claude Opus 4.5

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's usage policy.

Endpoint name: databricks-claude-opus-4-5

Claude Opus 4.5 is Anthropic's most capable hybrid reasoning model, built for the most complex tasks requiring deep analysis and extended thinking. This model combines powerful general purpose capabilities with advanced reasoning, excelling at code generation, research, content creation, and sophisticated multi-step agentic workflows. Claude Opus 4.5 supports text and vision inputs with a 200K token context window, making it ideal for enterprise applications that demand both breadth and depth of understanding.

As with other large language models, Claude Opus 4.5 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

Anthropic Claude Sonnet 4

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's usage policy.

Endpoint name: databricks-claude-sonnet-4

Claude Sonnet 4 is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. This model offers two modes: near-instant responses and extended thinking for deeper reasoning based on the complexity of the task. Claude Sonnet 4 is optimized for various tasks such as code development, large-scale content analysis, and agent application development.

As with other large language models, Claude Sonnet 4 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

Anthropic Claude Opus 4.1

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's usage policy.

Endpoint name: databricks-claude-opus-4-1

Claude Opus 4.1 is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. This general purpose large language model is designed for both complex reasoning and real-world applications at enterprise scale. It supports text and image input, with a 200K token context window and 32K output token capabilities. This model excels at tasks like code generation, research and content creation, and multi-step agents workflows without constant human intervention.

As with other large language models, Claude Opus 4.1 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

Anthropic Claude 3.7 Sonnet

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's usage policy.
Anthropic Claude 3.7 Sonnet will be retired on April 12, 2026. See Retired models for the recommended replacement model and guidance for how to migrate during deprecation.

Endpoint name: databricks-claude-3-7-sonnet

Claude 3.7 Sonnet is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. It is a large language model and reasoning model that is able to rapidly respond or extend its reasoning based on the complexity of the task. When in extended thinking mode, Claude 3.7 Sonnet's reasoning steps are visible to the user. Claude 3.7 Sonnet is optimized for various tasks such as code generation, mathematical reasoning and instruction following.

As with other large language models, Claude 3.7 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks within the Databricks security perimeter.

GTE Large (En)

Endpoint name: databricks-gte-large-en

General Text Embedding (GTE) is a text embedding model that can map any text to a 1024-dimension embedding vector and an embedding window of 8192 tokens. These vectors can be used in vector indexes for LLMs, and for tasks like retrieval, classification, question-answering, clustering, or semantic search. This endpoint serves the English version of the model and does not generate normalized embeddings.

Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. GTE can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.

OpenAI GPT-5.3 Codex​

OpenAI GPT-5.2 Codex​

OpenAI GPT-5.2​

OpenAI GPT-5.1​

OpenAI GPT-5.1 Codex Max​

OpenAI GPT-5.1 Codex Mini​

OpenAI GPT-5​

OpenAI GPT-5 mini​

OpenAI GPT-5 nano​

Google Gemini 3 Flash​

Google Gemini 3.1 Pro Preview​

Google Gemini 3 Pro Preview​

Google Gemini 2.5 Pro​

Google Gemini 2.5 Flash​

Alibaba Cloud Qwen3-Embedding-0.6B​

Alibaba Cloud Qwen3-Next 80B A3B Instruct​

OpenAI GPT OSS 120B​

OpenAI GPT OSS 20B​

Google Gemma 3 12B​

Meta Llama 4 Maverick​

Meta Llama 3.3 70B Instruct​

Anthropic Claude Haiku 4.5​

Anthropic Claude Sonnet 4.6​

Anthropic Claude Sonnet 4.5​

Anthropic Claude Opus 4.6​

Anthropic Claude Opus 4.5​

Anthropic Claude Sonnet 4​

Anthropic Claude Opus 4.1​

Anthropic Claude 3.7 Sonnet​

GTE Large (En)​

Additional resources​

OpenAI GPT-5.3 Codex

OpenAI GPT-5.2 Codex

OpenAI GPT-5.2

OpenAI GPT-5.1

OpenAI GPT-5.1 Codex Max

OpenAI GPT-5.1 Codex Mini

OpenAI GPT-5

OpenAI GPT-5 mini

OpenAI GPT-5 nano

Google Gemini 3 Flash

Google Gemini 3.1 Pro Preview

Google Gemini 3 Pro Preview

Google Gemini 2.5 Pro

Google Gemini 2.5 Flash

Alibaba Cloud Qwen3-Embedding-0.6B

Alibaba Cloud Qwen3-Next 80B A3B Instruct

OpenAI GPT OSS 120B

OpenAI GPT OSS 20B

Google Gemma 3 12B

Meta Llama 4 Maverick

Meta Llama 3.3 70B Instruct

Anthropic Claude Haiku 4.5

Anthropic Claude Sonnet 4.6

Anthropic Claude Sonnet 4.5

Anthropic Claude Opus 4.6

Anthropic Claude Opus 4.5

Anthropic Claude Sonnet 4

Anthropic Claude Opus 4.1

Anthropic Claude 3.7 Sonnet

GTE Large (En)

Additional resources