Supported foundation models on Mosaic AI Model Serving

This article describes the foundation models you can serve using Mosaic AI Model Serving.

Foundation models are large, pre-trained neural networks that are trained on both large and broad ranges of data. These models are designed to learn general patterns in language, images, or other data types, and can be fine-tuned for specific tasks with additional training. Your use of certain foundation models is subject to the model developer's license and acceptable use policy. See Applicable model developer licenses and terms.

Model Serving offers flexible options for hosting and querying foundation models based on your needs:

Pay-per-token: Ideal for experimentation and quick exploration. This option allows you to query pre-configured endpoints in your Databricks workspace without upfront infrastructure commitments.
AI Functions (batch inference): A subset of Databricks-hosted models are optimized for AI Functions. You can apply AI to your data and run batch inference production workloads at scale using these functions and their supported models.
Provisioned throughput: Recommended for production use cases requiring performance guarantees. This option enables the deployment of fine-tuned foundation models with optimized serving endpoints.
External models: This option enables access to foundation models hosted outside of Databricks, such as those provided by OpenAI or Anthropic. These models can be centrally managed within Databricks for streamlined governance.

Foundation models hosted on Databricks

Databricks hosts state-of-the-art open foundation models, like Meta Llama. These models are made available using Foundation Model APIs.

The following table summarizes which Databricks-hosted models and model families are supported in each region based on the Model Serving feature.

important

Google Gemini 3 Flash and Google Gemini 3 Pro are hosted on global endpoints and require cross geography routing to be enabled for every region.
Anthropic Claude 3.7 Sonnet will be retired on April 12, 2026. See Retired models for the recommended replacement model and guidance for how to migrate during deprecation.
Meta Llama 4 Maverick is available for Foundation Model APIs provisioned throughput workloads in Public Preview.
Starting December 11, 2024, Meta-Llama-3.3-70B-Instruct replaces support for Meta-Llama-3.1-70B-Instruct in Foundation Model APIs pay-per-token endpoints.
Meta-Llama-3.1-405B-Instruct will be retired as noted below. See Retired models for the recommended replacement model and guidance for how to migrate during deprecation.
- Starting February 15, 2026, this model is not available for pay-per-token workloads.
- Starting May 15, 2026, this model is not available for provisioned throughput workloads.
Starting February 15, 2026, the models associated with the following model families are retired and no longer available for use on any of the Model Serving feature areas. See Retired models for recommended replacement models and guidance for how to migrate during deprecation.
- Llama 3 70B
- Llama 3 8B
- Llama 2 70B
- Llama 2 13B
- Mistral 8x7B
- Mixtral 8x7B
Starting December 19, 2025, the models associated with the following model families are retired and no longer available for use on any of the Model Serving feature areas. See Retired models for recommended replacement models and guidance for how to migrate during deprecation.
- DBRX
- MPT 30B
- MPT 7B

* This model is supported based on GPU availability and requires cross geography routing to be enabled.

Access foundation models hosted outside of Databricks

Foundation models created by LLM providers, such as OpenAI and Anthropic, are also accessible on Databricks using External models. These models are hosted outside of Databricks and you can create an endpoint to query them. These endpoints can be centrally governed from Databricks, which streamlines the use and management of various LLM providers within your organization.

The following table presents a non-exhaustive list of supported models and corresponding endpoint types. You can use the listed model associations to help you configure your an endpoint for any newly released model types as they become available with a given provider. Customers are responsible for ensuring compliance with applicable model licenses.

note

With the rapid development of LLMs, there is no guarantee that this list is up to date at all times. New model versions from the same provider are typically supported even if they are not on the list.

Model provider	llm/v1/completions	llm/v1/chat	llm/v1/embeddings
OpenAI**	gpt-3.5-turbo-instruct babbage-002 davinci-002	o1 o1-mini o1-mini-2024-09-12 gpt-3.5-turbo gpt-4 gpt-4-turbo gpt-4-turbo-2024-04 gpt-4o gpt-4o-2024-05-13 gpt-4o-mini	text-embedding-ada-002 text-embedding-3-large text-embedding-3-small
Azure OpenAI**	text-davinci-003 gpt-35-turbo-instruct	o1 o1-mini gpt-35-turbo gpt-35-turbo-16k gpt-4 gpt-4-turbo gpt-4-32k gpt-4o gpt-4o-mini	text-embedding-ada-002 text-embedding-3-large text-embedding-3-small
Anthropic	claude-1 claude-1.3-100k claude-2 claude-2.1 claude-2.0 claude-instant-1.2	claude-3-5-sonnet-latest claude-3-5-haiku-latest claude-3-5-opus-latest claude-3-5-sonnet-20241022 claude-3-5-haiku-20241022 claude-3-5-sonnet-20240620 claude-3-haiku-20240307 claude-3-opus-20240229 claude-3-sonnet-20240229
Cohere**	command command-light	command-r7b-12-2024 command-r-plus-08-2024 command-r-08-2024 command-r-plus command-r command command-light-nightly command-light command-nightly	embed-english-v2.0 embed-multilingual-v2.0 embed-english-light-v2.0 embed-english-v3.0 embed-english-light-v3.0 embed-multilingual-v3.0 embed-multilingual-light-v3.0
Mosaic AI Model Serving	Databricks serving endpoint	Databricks serving endpoint	Databricks serving endpoint
Amazon Bedrock	Anthropic: claude-instant-v1 claude-v2 Cohere: command-text-v14 command-light-text-v14 AI21 Labs: j2-grande-instruct j2-jumbo-instruct j2-mid j2-mid-v1 j2-ultra j2-ultra-v1	Anthropic: claude-3-5-sonnet-20241022-v2:0 claude-3-5-haiku-20241022-v1:0 claude-3-opus-20240229-v1:0 claude-3-sonnet-20240229-v1:0 claude-3-5-sonnet-20240620-v1:0 Cohere: command-r-plus-v1:0 command-r-v1:0 Amazon: nova-lite-v1:0 nova-micro-v1:0 nova-pro-v1:0	Amazon: titan-embed-text-v2:0 titan-embed-text-v1 titan-embed-g1-text-02 Cohere: embed-english-v3 embed-multilingual-v3
AI21 Labs`†`	j2-mid j2-light j2-ultra
Google Cloud Vertex AI	text-bison	chat-bison gemini-pro gemini-1.0-pro gemini-1.5-pro gemini-1.5-flash gemini-2.0-flash	text-embedding-004 text-embedding-005 textembedding-gecko

** Model provider supports fine-tuned completion and chat models. To query a fine-tuned model, populate the name field of the external model configuration with the name of your fine-tuned model.

† Model provider supports custom completion models.

Create foundation model serving endpoints

To query and use foundation models in your AI applications, you must first create a model serving endpoint. Model Serving uses a unified API and UI for creating and updating foundation model serving endpoints.

To create an endpoint that serves fine-tuned variants of foundation models made available using Foundation Model APIs provisioned throughput, see Create your provisioned throughput endpoint using the REST API.
For creating serving endpoints that access foundation models made available using the External models offering, see Create an external model serving endpoint.

Query foundation model serving endpoints

After you create your serving endpoint you are able to query your foundation model. Model Serving uses a unified OpenAI-compatible API and SDK for querying foundation models. This unified experience simplifies how you experiment with and customize foundation models for production across supported clouds and providers.

See Use foundation models.

Foundation models hosted on Databricks​

Access foundation models hosted outside of Databricks​

Create foundation model serving endpoints​

Query foundation model serving endpoints​

Foundation models hosted on Databricks

Access foundation models hosted outside of Databricks

Create foundation model serving endpoints

Query foundation model serving endpoints