Supported models for Databricks Foundation Models APIs
This feature is in Public Preview and is supported in us-east1
and us-central1
for both Foundation Model APIs pay-per-token and provisioned throughput endpoints.
This article describes the state-of-the-art open models that are supported by the Databricks Foundation Model APIs in pay-per-token mode.
See Foundation Model APIs limits for the pay-per-token models only supported in US regions.
You can send query requests to these models using the pay-per-token endpoints available in your Databricks workspace. See Use foundation models and pay-per-token supported models table for the names of the model endpoints to use.
In addition to supporting models in pay-per-token mode, Foundation Model APIs also offers provisioned throughput mode. Databricks recommends provisioned throughput for production workloads. This mode supports all models of a model architecture family (for example, DBRX models), including the fine-tuned and custom pre-trained models supported in pay-per-token mode. See Provisioned throughput Foundation Model APIs for the list of supported architectures.
You can interact with these supported models using the AI Playground.
Meta Llama 4 Maverick
See Applicable model developer licenses and terms for the Llama 4 Community License and Acceptable Use Policy.
Llama 4 Maverick is a state-of-the-art large language model built and trained by Meta. It is the first of the Llama model family to use a mixture of experts architecture for compute efficiency. Llama 4 Maverick supports multiple languages and is optimized for precise image and text understanding use cases. Currently, Databricks support of Llama 4 Maverick is limited to text understanding use cases. Learn more about Llama 4 Maverick.
As with other large language models, Llama 4 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.
Meta Llama 3.3 70B Instruct
Meta-Llama-3.3-70B-Instruct is a state-of-the-art large language model with a context of 128,000 tokens that was built and trained by Meta. The model supports multiple languages and is optimized for dialogue use cases. Learn more about the Meta Llama 3.3.
Similar to other large language models, Llama-3’s output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.
Anthropic Claude 3.7 Sonnet
Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy.
Claude 3.7 Sonnet is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. It is a large language model and reasoning model that is able to rapidly respond or extend its reasoning based on the complexity of the task. When in extended thinking mode, Claude 3.7 Sonnet's reasoning steps are visible to the user. Claude 3.7 Sonnet is optimized for various tasks such as code generation, mathematical reasoning and instruction following.
As with other large language models, Claude 3.7 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.
This endpoint is hosted by Databricks Inc. in AWS within the Databricks security perimeter.
GTE Large (En)
GTE Large (En) is provided under and subject to the Apache 2.0 License, Copyright (c) The Apache Software Foundation, All rights reserved. Customers are responsible for ensuring compliance with applicable model licenses.
General Text Embedding (GTE) is a text embedding model that can map any text to a 1024-dimension embedding vector and an embedding window of 8192 tokens. These vectors can be used in vector indexes for LLMs, and for tasks like retrieval, classification, question-answering, clustering, or semantic search. This endpoint serves the English version of the model and does not generate normalized embeddings.
Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. GTE can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.