Supported foundation models on Mosaic AI Model Serving

This article describes the foundation models you can serve using Mosaic AI Model Serving.

Foundation models are large, pre-trained neural networks that are trained on both large and broad ranges of data. These models are designed to learn general patterns in language, images, or other data types, and can be fine-tuned for specific tasks with additional training.

Model Serving offers flexible options for hosting and querying foundation models based on your needs:

  • Pay-per-token: Ideal for experimentation and quick exploration. This option allows you to query pre-configured endpoints in your Databricks workspace without upfront infrastructure commitments.

  • Provisioned throughput: Recommended for production use cases requiring performance guarantees. This option enables the deployment of fine-tuned foundation models with optimized serving endpoints.

  • External models: This option enables access to foundation models hosted outside of Databricks, such as those provided by OpenAI or Anthropic. These models can be centrally managed within Databricks for streamlined governance.

Foundation models hosted on Databricks

Databricks hosts state-of-the-art open foundation models, like Meta Llama. These models are made available using Foundation Model APIs and are accessible using either pay-per-token or provisioned throughput.

Pay-per-token

Foundation Model APIs pay-per-token is recommended for getting started and quick exploration. Each model that is supported using Foundation Model APIs pay-per-token has a preconfigured endpoint in your Databricks workspace that you can test and query. You can also interact and chat with these models using the AI Playground.

The following table summarizes the supported models for pay-per-token. See Foundation Model APIs limits for model specific region availability.

Important

  • Starting December 11, 2024, Meta-Llama-3.3-70B-Instruct replaces support for Meta-Llama-3.1-70B-Instruct in Foundation Model APIs pay-per-token endpoints.

  • The following models are now retired. See Retired models for recommended replacement models.

    • Llama 2 70B Chat

    • MPT 7B Instruct

    • MPT 30B Instruct

Model

Task type

Endpoint

Notes

GTE Large (English)

Embedding

databricks-gte-large-en

Does not generate normalized embeddings.

Meta-Llama-3.3-70B-Instruct

Chat

databricks-meta-llama-3-3-70b-instruct

Meta-Llama-3.1-405B-Instruct*

Chat

databricks-meta-llama-3-1-405b-instruct

DBRX Instruct

Chat

databricks-dbrx-instruct

Mixtral-8x7B Instruct

Chat

databricks-mixtral-8x7b-instruct

BGE Large (English)

Embedding

databricks-bge-large-en

* Reach out to your Databricks account team if you encounter endpoint failures or stabilization errors when using this model.

Provisioned throughput

Foundation Model APIs provisioned throughput is recommended for production cases. You can create an endpoint that uses provisioned throughput to deploy fine-tuned foundation model architectures. When you use provisioned throughput the serving endpoint is optimized for foundation model workloads that require performance guarantees.

The following table summarizes the supported model architectures for provisioned throughput. Databricks recommends using the pretrained foundation models in Unity Catalog because these models are specifically optimized for provisioned throughput workloads. See Provisioned throughput limits for supported model variants and region availability.

Important

Meta Llama 3.3 is licensed under the LLAMA 3.3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. Customers are responsible for ensuring their compliance with the terms of this license and the Llama 3.3 Acceptable Use Policy.

Meta Llama 3.2 is licensed under the LLAMA 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. Customers are responsible for ensuring their compliance with the terms of this license and the Llama 3.2 Acceptable Use Policy.

Meta Llama 3.1 are licensed under the LLAMA 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. Customers are responsible for ensuring compliance with applicable model licenses.

Model architecture

Task types

Notes

Meta Llama 3.3

Chat or Completion

Meta Llama 3.2 3B

Chat or Completion

Meta Llama 3.2 1B

Chat or Completion

Meta Llama 3.1

Chat or Completion

Meta Llama 3

Chat or Completion

Meta Llama 2

Chat or Completion

DBRX

Chat or Completion

Mistral

Chat or Completion

Mixtral

Chat or Completion

MPT

Chat or Completion

GTE v1.5 (English)

Embedding

Does not generate normalized embeddings.

BGE v1.5 (English)

Embedding

Access foundation models hosted outside of Databricks

Foundation models created by LLM providers, such as OpenAI and Anthropic, are also accessible on Databricks using External models. These models are hosted outside of Databricks and you can create an endpoint to query them. These endpoints can be centrally governed from Databricks, which streamlines the use and management of various LLM providers within your organization.

The following table presents a non-exhaustive list of supported models and corresponding endpoint types. You can use the listed model associations to help you configure your an endpoint for any newly released model types as they become available with a given provider. Customers are responsible for ensuring compliance with applicable model licenses.

Note

With the rapid development of LLMs, there is no guarantee that this list is up to date at all times. New model versions from the same provider are typically supported even if they are not on the list.

Model provider

llm/v1/completions

llm/v1/chat

llm/v1/embeddings

OpenAI**

  • gpt-3.5-turbo-instruct

  • babbage-002

  • davinci-002

  • o1

  • o1-mini

  • o1-mini-2024-09-12

  • gpt-3.5-turbo

  • gpt-4

  • gpt-4-turbo

  • gpt-4-turbo-2024-04

  • gpt-4o

  • gpt-4o-2024-05-13

  • gpt-4o-mini

  • text-embedding-ada-002

  • text-embedding-3-large

  • text-embedding-3-small

Azure OpenAI**

  • text-davinci-003

  • gpt-35-turbo-instruct

  • o1

  • o1-mini

  • gpt-35-turbo

  • gpt-35-turbo-16k

  • gpt-4

  • gpt-4-turbo

  • gpt-4-32k

  • gpt-4o

  • gpt-4o-mini

  • text-embedding-ada-002

  • text-embedding-3-large

  • text-embedding-3-small

Anthropic

  • claude-1

  • claude-1.3-100k

  • claude-2

  • claude-2.1

  • claude-2.0

  • claude-instant-1.2

  • claude-3-5-sonnet-latest

  • claude-3-5-haiku-latest

  • claude-3-5-opus-latest

  • claude-3-5-sonnet-20241022

  • claude-3-5-haiku-20241022

  • claude-3-5-sonnet-20240620

  • claude-3-haiku-20240307

  • claude-3-opus-20240229

  • claude-3-sonnet-20240229

Cohere**

  • command

  • command-light

  • command-r7b-12-2024

  • command-r-plus-08-2024

  • command-r-08-2024

  • command-r-plus

  • command-r

  • command

  • command-light-nightly

  • command-light

  • command-nightly

  • embed-english-v2.0

  • embed-multilingual-v2.0

  • embed-english-light-v2.0

  • embed-english-v3.0

  • embed-english-light-v3.0

  • embed-multilingual-v3.0

  • embed-multilingual-light-v3.0

Mosaic AI Model Serving

Databricks serving endpoint

Databricks serving endpoint

Databricks serving endpoint

Amazon Bedrock

Anthropic:

  • claude-instant-v1

  • claude-v2

Cohere:

  • command-text-v14

  • command-light-text-v14

AI21 Labs:

  • j2-grande-instruct

  • j2-jumbo-instruct

  • j2-mid

  • j2-mid-v1

  • j2-ultra

  • j2-ultra-v1

Anthropic:

  • claude-3-5-sonnet-20241022-v2:0

  • claude-3-5-haiku-20241022-v1:0

  • claude-3-opus-20240229-v1:0

  • claude-3-sonnet-20240229-v1:0

  • claude-3-5-sonnet-20240620-v1:0

Cohere:

  • command-r-plus-v1:0

  • command-r-v1:0

Amazon:

  • titan-embed-text-v1

  • titan-embed-g1-text-02

Cohere:

  • embed-english-v3

  • embed-multilingual-v3

AI21 Labs†

  • j2-mid

  • j2-light

  • j2-ultra

Google Cloud Vertex AI

text-bison

  • chat-bison

  • gemini-pro

  • gemini-1.0-pro

  • gemini-1.5-pro

  • gemini-1.5-flash

  • gemini-2.0-flash

  • text-embedding-004

  • text-embedding-005

  • textembedding-gecko

** Model provider supports fine-tuned completion and chat models. To query a fine-tuned model, populate the name field of the external model configuration with the name of your fine-tuned model.

† Model provider supports custom completion models.

Create foundation model serving endpoints

To query and use foundation models in your AI applications, you must first create a model serving endpoint. Model Serving uses a unified API and UI for creating and updating foundation model serving endpoints.

Query foundation model serving endpoints

After you create your serving endpoint you are able to query your foundation model. Model Serving uses a unified OpenAI-compatible API and SDK for querying foundation models. This unified experience simplifies how you experiment with and customize foundation models for production across supported clouds and providers.

See Query foundation models.