Supported models for Databricks Foundation Models APIs

This article describes the state-of-the-art open models that are supported by the Databricks Foundation Model APIs.

note

See Supported foundation models on Mosaic AI Model Serving for region availability of these models and the supported feature areas.

You can send query requests to these models using the pay-per-token endpoints available in your Databricks workspace. See Use foundation models and pay-per-token supported models table for the names of the model endpoints to use.

In addition to supporting models in pay-per-token mode, Foundation Model APIs also offers provisioned throughput mode. Databricks recommends provisioned throughput for production workloads. This mode supports all models of a model architecture family (for example, DBRX models), including the fine-tuned and custom pre-trained models supported in pay-per-token mode. See Provisioned throughput Foundation Model APIs for the list of supported architectures.

You can interact with these supported models using the AI Playground.

Google Gemma 3 12B

important

See Applicable model developer licenses and terms for the Gemma 3 Community License and Acceptable Use Policy.

Gemma 3 12B is a 12-billion parameter language model developed by Google as part of the Gemma 3 family. Gemma 3 has up to a 128K token context and provides multilingual support for over 140 languages. This model is designed to handle text inputs and generate text outputs, and is optimized for dialogue use cases and text generation tasks, including question answering.

As with other large language models, Gemma 3 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 4 Maverick

important

See Applicable model developer licenses and terms for the Llama 4 Community License and Acceptable Use Policy.

Llama 4 Maverick is a state-of-the-art large language model built and trained by Meta. It is the first of the Llama model family to use a mixture of experts architecture for compute efficiency. Llama 4 Maverick supports multiple languages and is optimized for precise image and text understanding use cases. Currently, Databricks support of Llama 4 Maverick is limited to text understanding use cases. Learn more about Llama 4 Maverick.

As with other large language models, Llama 4 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 3.3 70B Instruct

important

Starting December 11, 2024, Meta-Llama-3.3-70B-Instruct replaces support for Meta-Llama-3.1-70B-Instruct in Foundation Model APIs pay-per-token endpoints.

See Applicable model developer licenses and terms for the LLama 3.3 Community License and Acceptable Use Policy.

Meta-Llama-3.3-70B-Instruct is a state-of-the-art large language model with a context of 128,000 tokens that was built and trained by Meta. The model supports multiple languages and is optimized for dialogue use cases. Learn more about the Meta Llama 3.3.

Similar to other large language models, Llama-3's output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 3.1 405B Instruct

Preview

The use of this model with Foundation Model APIs is in Public Preview. Reach out to your Databricks account team if you encounter endpoint failures or stabilization errors when using this model.

See Applicable model developer licenses and terms for the Llama 3.1 Community License and Acceptable Use Policy.

Meta-Llama-3.1-405B-Instruct is the largest openly available state-of-the-art large language model, built and trained by Meta. The use of this model enables customers to unlock new capabilities, such as advanced, multi-step reasoning and high-quality synthetic data generation. This model is competitive with GPT-4-Turbo in terms of quality.

Like Meta-Llama-3.1-70B-Instruct, this model has a context of 128,000 tokens and support across ten languages. It aligns with human preferences for helpfulness and safety, and is optimized for dialogue use cases. Learn more about the Meta Llama 3.1 models.

Similar to other large language models, Llama-3.1's output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 3.1 8B Instruct

important

See Applicable model developer licenses and terms for the LLama 3.1 Community License and Acceptable Use Policy.

Meta-Llama-3.1-8B-Instruct is a state-of-the-art large language model with a context of 128,000 tokens that was built and trained by Meta. The model supports multiple languages and is optimized for dialogue use cases. Learn more about the Meta Llama 3.1.

Anthropic Claude Sonnet 4

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy.

Claude Sonnet 4 is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. This model offers two modes: near-instant responses and extended thinking for deeper reasoning based on the complexity of the task. Claude Sonnet 4 is optimized for various tasks such as code development, large-scale content analysis, and agent application development.

As with other large language models, Claude Sonnet 4 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. in AWS within the Databricks security perimeter.

Anthropic Claude Opus 4

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy.

Claude Opus 4 is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. This model offers two modes: near-instant responses and extended thinking for deeper reasoning based on the complexity of the task. Claude Opus 4 is optimized for various tasks such as advanced code generation, agent orchestration, cross-source research, content creation, and summarization using context retention.

As with other large language models, Claude Opus 4 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. in AWS within the Databricks security perimeter.

Anthropic Claude 3.7 Sonnet

important

Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy.

Claude 3.7 Sonnet is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. It is a large language model and reasoning model that is able to rapidly respond or extend its reasoning based on the complexity of the task. When in extended thinking mode, Claude 3.7 Sonnet's reasoning steps are visible to the user. Claude 3.7 Sonnet is optimized for various tasks such as code generation, mathematical reasoning and instruction following.

As with other large language models, Claude 3.7 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. in AWS within the Databricks security perimeter.

GTE Large (En)

important

GTE Large (En) is provided under and subject to the Apache 2.0 License, Copyright (c) The Apache Software Foundation, All rights reserved. Customers are responsible for ensuring compliance with applicable model licenses.

General Text Embedding (GTE) is a text embedding model that can map any text to a 1024-dimension embedding vector and an embedding window of 8192 tokens. These vectors can be used in vector indexes for LLMs, and for tasks like retrieval, classification, question-answering, clustering, or semantic search. This endpoint serves the English version of the model and does not generate normalized embeddings.

Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. GTE can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.

BGE Large (En)

BAAI General Embedding (BGE) is a text embedding model that can map any text to a 1024-dimension embedding vector and an embedding window of 512 tokens. These vectors can be used in vector indexes for LLMs, and for tasks like retrieval, classification, question-answering, clustering, or semantic search. This endpoint serves the English version of the model and generates normalized embeddings.

Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. BGE can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.

In RAG applications, you may be able to improve the performance of your retrieval system by including an instruction parameter. The BGE authors recommend trying the instruction "Represent this sentence for searching relevant passages:" for query embeddings, though its performance impact is domain dependent.

Google Gemma 3 12B​

Meta Llama 4 Maverick​

Meta Llama 3.3 70B Instruct​

Meta Llama 3.1 405B Instruct​

Meta Llama 3.1 8B Instruct​

Anthropic Claude Sonnet 4​

Anthropic Claude Opus 4​

Anthropic Claude 3.7 Sonnet​

GTE Large (En)​

BGE Large (En)​

Additional resources​

Google Gemma 3 12B

Meta Llama 4 Maverick

Meta Llama 3.3 70B Instruct

Meta Llama 3.1 405B Instruct

Meta Llama 3.1 8B Instruct

Anthropic Claude Sonnet 4

Anthropic Claude Opus 4

Anthropic Claude 3.7 Sonnet

GTE Large (En)

BGE Large (En)

Additional resources