Skip to main content

Query foundation models

Preview

Mosaic AI Model Serving is in Public Preview and is supported in us-east1 and us-central1.

In this article, you learn how to format query requests for foundation models hosted outside of Databricks and send them to your model serving endpoint.

For traditional ML or Python models query requests, see Query serving endpoints for custom models.

Mosaic AI Model Serving supports external models for accessing foundation models that are hosted outside of Databricks. Model Serving uses a unified OpenAI-compatible API and SDK for querying them. This makes it possible to experiment with and customize generative AI models for production across supported clouds and providers.

Mosaic AI Model Serving provides the following options for sending scoring requests to endpoints that serve foundation models or external models:

Method

Details

OpenAI client

Query a model hosted by a Mosaic AI Model Serving endpoint using the OpenAI client. Specify the model serving endpoint name as the model input. Supported for chat, embeddings, and completions models made available by external models.

Serving UI

Select Query endpoint from the Serving endpoint page. Insert JSON format model input data and click Send Request. If the model has an input example logged, use Show Example to load it.

REST API

Call and query the model using the REST API. See POST /serving-endpoints/{name}/invocations for details. For scoring requests to endpoints serving multiple models, see Query individual models behind an endpoint.

MLflow Deployments SDK

Use MLflow Deployments SDK’s predict() function to query the model.

Databricks Python SDK

Databricks Python SDK is a layer on top of the REST API. It handles low-level details, such as authentication, making it easier to interact with the models.

Requirements

Install packages

After you have selected a querying method, you must first install the appropriate package to your cluster.

To use the OpenAI client, the databricks-sdk[openai] package needs to be installed on your cluster. Databricks SDK provides a wrapper for constructing the OpenAI client with authorization automatically configured to query generative AI models. Run the following in your notebook or your local terminal:

!pip install databricks-sdk[openai]>=0.35.0

The following is only required when installing the package on a Databricks Notebook

Python
dbutils.library.restartPython()

Query a chat completion model

The following are examples for querying a chat model. The example applies to querying a chat model made available using external models.

To use the OpenAI client, specify the model serving endpoint name as the model input. The following example assumes you have a Databricks API token and openai installed on your compute. You also need your Databricks workspace instance to connect the OpenAI client to Databricks.

Python

import os
import openai
from openai import OpenAI

client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.chat.completions.create(
model="bedrock-chat-completions-endpoint",
messages=[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
max_tokens=256
)

As an example, the following is the expected request format for a chat model when using the REST API. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.

Bash
{
"messages": [
{
"role": "user",
"content": "What is a mixture of experts model?"
}
],
"max_tokens": 100,
"temperature": 0.1
}

The following is an expected response format for a request made using the REST API:

JSON
{
"model": "bedrock-chat-completions-endpoint",
"choices": [
{
"message": {},
"index": 0,
"finish_reason": null
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 74,
"total_tokens": 81
},
"object": "chat.completion",
"id": null,
"created": 1698824353
}

Query an embedding model

The following example is an embeddings request for the gte-large-en model made available by external models.

To use the OpenAI client, specify the model serving endpoint name as the model input.

Python

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.embeddings.create(
model="cohere-embeddings-endpoint",
input="what is databricks"
)

To query foundation models outside your workspace, you must use the OpenAI client directly, as demonstrated below. The following example assumes you have a Databricks API token and openai installed on your compute. You also need your Databricks workspace instance to connect the OpenAI client to Databricks.

Python

import os
import openai
from openai import OpenAI

client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.embeddings.create(
model="cohere-embeddings-endpoint",
input="what is databricks"
)

The following is the expected request format for an embeddings model. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.

Bash

{
"input": [
"embedding text"
]
}

The following is the expected response format:

JSON
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": []
}
],
"model": "text-embedding-ada-002-v2",
"usage": {
"prompt_tokens": 2,
"total_tokens": 2
}
}

Check if embeddings are normalized

Use the following to check if the embeddings generated by your model are normalized.

Python

import numpy as np

def is_normalized(vector: list[float], tol=1e-3) -> bool:
magnitude = np.linalg.norm(vector)
return abs(magnitude - 1) < tol

Query a text completion model

The following example applies to querying a text completions model made available using external models.

The following example queries the claude-2 completions model hosted by Anthropic using the OpenAI client. To use the OpenAI client, populate the model field with the name of the model serving endpoint that hosts the model you want to query.

This example uses a previously created endpoint, anthropic-completions-endpoint, configured for accessing external models from the Anthropic model provider. See how to create external model endpoints.

See Supported models for additional models you can query and their providers.

Python

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()
completion = openai_client.completions.create(
model="anthropic-completions-endpoint",
prompt="what is databricks",
temperature=1.0
)
print(completion)

The following is the expected request format for a completions model. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.

Bash
{
"prompt": "What is mlflow?",
"max_tokens": 100,
"temperature": 0.1,
"stop": [
"Human:"
],
"n": 1,
"stream": false,
"extra_params":
{
"top_p": 0.9
}
}

The following is the expected response format:

JSON
{
"id": "cmpl-8FwDGc22M13XMnRuessZ15dG622BH",
"object": "text_completion",
"created": 1698809382,
"model": "gpt-3.5-turbo-instruct",
"choices": [
{
"text": "MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for tracking experiments, managing and deploying models, and collaborating on projects. MLflow also supports various machine learning frameworks and languages, making it easier to work with different tools and environments. It is designed to help data scientists and machine learning engineers streamline their workflows and improve the reproducibility and scalability of their models.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 83,
"total_tokens": 88
}
}

Chat with supported LLMs using AI Playground

You can interact with supported large language models using the AI Playground. The AI Playground is a chat-like environment where you can test, prompt, and compare LLMs from your Databricks workspace.

AI playground

Additional resources