Skip to main content

Use foundation models

In this article, you learn which options are available to write query requests for foundation models and how to send them to your model serving endpoint. You can query foundation models that are hosted by Databricks and foundation models hosted outside of Databricks.

For traditional ML or Python models query requests, see Query serving endpoints for custom models.

Mosaic AI Model Serving supports Foundation Models APIs and external models for accessing foundation models. Model Serving uses a unified OpenAI-compatible API and SDK for querying them. This makes it possible to experiment with and customize foundation models for production across supported clouds and providers.

Query options

Mosaic AI Model Serving provides the following options for sending query requests to endpoints that serve foundation models:

Method

Details

OpenAI client

Query a model hosted by a Mosaic AI Model Serving endpoint using the OpenAI client. Specify the model serving endpoint name as the model input. Supported for chat, embeddings, and completions models made available by external models.

Serving UI

Select Query endpoint from the Serving endpoint page. Insert JSON format model input data and click Send Request. If the model has an input example logged, use Show Example to load it.

REST API

Call and query the model using the REST API. See POST /serving-endpoints/{name}/invocations for details. For scoring requests to endpoints serving multiple models, see Query individual models behind an endpoint.

MLflow Deployments SDK

Use MLflow Deployments SDK's predict() function to query the model.

Databricks Python SDK

Databricks Python SDK is a layer on top of the REST API. It handles low-level details, such as authentication, making it easier to interact with the models.

Requirements

Install packages

After you have selected a querying method, you must first install the appropriate package to your cluster.

To use the OpenAI client, the databricks-sdk[openai] package needs to be installed on your cluster. Databricks SDK provides a wrapper for constructing the OpenAI client with authorization automatically configured to query generative AI models. Run the following in your notebook or your local terminal:

!pip install databricks-sdk[openai]>=0.35.0

The following is only required when installing the package on a Databricks Notebook

Python
dbutils.library.restartPython()

Foundation model types

The following table summarizes the supported foundation models based on task type.

Task type

Description

Supported models

When to use? Recommended use cases

Chat

Models designed to understand and engage in natural, multi-turn conversations. They are fine-tuned on large datasets of human dialogue, which enables them to generate contextually relevant responses, track conversational history, and provide coherent, human-like interactions across various topics.

The following are supported Databricks-hosted foundation models:

The following are supported external models:

  • OpenAI GPT and o series models
  • Anthropic Claude models
  • Google Gemini models

Recommended for scenarios where natural, multi-turn dialogue and contextual understanding are needed:

  • Virtual assistants
  • Customer support bots
  • Interactive tutoring systems.

Embeddings

Embedding models are machine learning systems that transform complex data—such as text, images, or audio—into compact numerical vectors called embeddings. These vectors capture the essential features and relationships within the data, allowing for efficient comparison, clustering, and semantic search.

The following are supported Databricks-hosted foundation model:

The following are supported external models:

  • OpenAI text embedding models
  • Cohere text embedding models
  • Google text embedding models

Recommended for applications where semantic understanding, similarity comparison, and efficient retrieval or clustering of complex data are essential:

  • Semantic search
  • Retrieval augmented generation (RAG)
  • Topic clustering
  • Sentiment analysis and text analytics

Vision

Models designed to process, interpret, and analyze visual data—such as images and videos so machines can "see" and understand the visual world.

The following are supported Databricks-hosted foundation models:

The following are supported external models:

  • OpenAI GPT and o series models with vision capabilities
  • Anthropic Claude models with vision capabilities
  • Google Gemini models with vision capabilities
  • Other external foundation models with vision capabilities that are OpenAI API compatible are also supported.

Recommended wherever automated, accurate, and scalable analysis of visual information is needed:

  • Object detection and recognition
  • Image classification
  • Image segmentation
  • Document understanding

Reasoning

Advanced AI systems designed to simulate human-like logical thinking. Reasoning models integrate techniques such as symbolic logic, probabilistic reasoning, and neural networks to analyze context, break down tasks, and explain their decision-making.

The following are supported Databricks-hosted foundation model:

The following are supported external models:

  • OpenAI models with reasoning capabilities
  • Anthropic Claude models with reasoning capabilities
  • Google Gemini models with reasoning capabilities

Recommended wherever automated, accurate, and scalable analysis of visual information is needed:

  • Code generation
  • Content creation and summarization
  • Agent orchestration

Function calling

Databricks Function Calling is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs and serving endpoints that serve external models. For details, see Function calling on Databricks.

Structured outputs

Structured outputs is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs. For details, see Structured outputs on Databricks.

Chat with supported LLMs using AI Playground

You can interact with supported large language models using the AI Playground. The AI Playground is a chat-like environment where you can test, prompt, and compare LLMs from your Databricks workspace.

AI playground

Additional resources