Deploy generative AI foundation models

This article describes support for serving and querying generative AI and LLM foundation models using Databricks Model Serving.


For a getting started tutorial on how to query a foundation model on Databricks, see Get started querying LLMs on Databricks.

What are foundation models?

Foundation models are large ML models pre-trained with the intention that they are to be fine-tuned for more specific language understanding and generation tasks. These models are utilized to discern patterns within the input data for generative AI and LLM workloads.

Databricks Model Serving supports serving and querying foundation models using the following capabilities:

  • Foundation Model APIs. This functionality makes state-of-the-art open models available to your model serving endpoint. These models are curated foundation model architectures that support optimized inference. Base models, like DBRX Instruct, Llama-2-70B-chat, BGE-Large, and Mistral-7B are available for immediate use with pay-per-token pricing, and workloads that require performance guarantees and fine-tuned model variants can be deployed with provisioned throughput.

  • External models. These are models that are hosted outside of Databricks. Endpoints that serve external models can be centrally governed and customers can establish rate limits and access control for them. Examples include foundation models like, OpenAI’s GPT-4, Anthropic’s Claude, and others.


To access and query foundation models using Databricks Model Serving, review the requirements for each functionality.

Create a foundation model serving endpoint

See Create foundation model serving endpoints