Use foundation models
In this article, you learn which options are available to write query requests for foundation models and how to send them to your model serving endpoint. You can query foundation models that are hosted by Databricks and foundation models hosted outside of Databricks.
For traditional ML or Python models query requests, see Query serving endpoints for custom models.
Mosaic AI Model Serving supports Foundation Models APIs and external models for accessing foundation models. Model Serving uses a unified OpenAI-compatible API and SDK for querying them. This makes it possible to experiment with and customize foundation models for production across supported clouds and providers.
Query options
Mosaic AI Model Serving provides the following options for sending query requests to endpoints that serve foundation models:
Method | Details |
---|---|
OpenAI client | Query a model hosted by a Mosaic AI Model Serving endpoint using the OpenAI client. Specify the model serving endpoint name as the |
Serving UI | Select Query endpoint from the Serving endpoint page. Insert JSON format model input data and click Send Request. If the model has an input example logged, use Show Example to load it. |
REST API | Call and query the model using the REST API. See POST /serving-endpoints/{name}/invocations for details. For scoring requests to endpoints serving multiple models, see Query individual models behind an endpoint. |
MLflow Deployments SDK | Use MLflow Deployments SDK's predict() function to query the model. |
Databricks Python SDK | Databricks Python SDK is a layer on top of the REST API. It handles low-level details, such as authentication, making it easier to interact with the models. |
Requirements
- A model serving endpoint.
- A Databricks workspace in a supported region.
- To send a scoring request through the OpenAI client, REST API or MLflow Deployment SDK, you must have a Databricks API token.
Install packages
After you have selected a querying method, you must first install the appropriate package to your cluster.
- OpenAI client
- REST API
- MLflow Deployments SDK
- Databricks Python SDK
To use the OpenAI client, the databricks-sdk[openai]
package needs to be installed on your cluster. Databricks SDK provides a wrapper for constructing the OpenAI client with authorization automatically configured to query generative AI models. Run the following in your notebook or your local terminal:
!pip install databricks-sdk[openai]>=0.35.0
The following is only required when installing the package on a Databricks Notebook
dbutils.library.restartPython()
Access to the Serving REST API is available in Databricks Runtime for Machine Learning.
!pip install mlflow
The following is only required when installing the package on a Databricks Notebook
dbutils.library.restartPython()
The Databricks SDK for Python is already installed on all Databricks clusters that use Databricks Runtime 13.3 LTS or above. For Databricks clusters that use Databricks Runtime 12.2 LTS and below, you must install the Databricks SDK for Python first. See Databricks SDK for Python.
Foundation model types
The following table summarizes the supported foundation models based on task type.
Task type | Description | Supported models | When to use? Recommended use cases |
---|---|---|---|
Models designed to understand and engage in natural, multi-turn conversations. They are fine-tuned on large datasets of human dialogue, which enables them to generate contextually relevant responses, track conversational history, and provide coherent, human-like interactions across various topics. | The following are supported Databricks-hosted foundation models:
The following are supported external models:
| Recommended for scenarios where natural, multi-turn dialogue and contextual understanding are needed:
| |
Embedding models are machine learning systems that transform complex data—such as text, images, or audio—into compact numerical vectors called embeddings. These vectors capture the essential features and relationships within the data, allowing for efficient comparison, clustering, and semantic search. | The following are supported Databricks-hosted foundation model: The following are supported external models:
| Recommended for applications where semantic understanding, similarity comparison, and efficient retrieval or clustering of complex data are essential:
| |
Models designed to process, interpret, and analyze visual data—such as images and videos so machines can "see" and understand the visual world. | The following are supported Databricks-hosted foundation models: The following are supported external models:
| Recommended wherever automated, accurate, and scalable analysis of visual information is needed:
| |
Advanced AI systems designed to simulate human-like logical thinking. Reasoning models integrate techniques such as symbolic logic, probabilistic reasoning, and neural networks to analyze context, break down tasks, and explain their decision-making. | The following are supported Databricks-hosted foundation model: The following are supported external models:
| Recommended wherever automated, accurate, and scalable analysis of visual information is needed:
|
Function calling
Databricks Function Calling is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs and serving endpoints that serve external models. For details, see Function calling on Databricks.
Structured outputs
Structured outputs is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs. For details, see Structured outputs on Databricks.
Chat with supported LLMs using AI Playground
You can interact with supported large language models using the AI Playground. The AI Playground is a chat-like environment where you can test, prompt, and compare LLMs from your Databricks workspace.