Skip to main content

Query an embedding model

In this article, you learn how to write query requests for foundation models that are optimized for embeddings tasks and send them to your model serving endpoint.

The examples in this article apply to querying foundation models that are made available using either:

Requirements

Query examples

The following example is an embeddings request for the gte-large-en model made available by external models.

To use the OpenAI client, specify the model serving endpoint name as the model input.

Python

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.embeddings.create(
model="cohere-embeddings-endpoint",
input="what is databricks"
)

To query foundation models outside your workspace, you must use the OpenAI client directly, as demonstrated below. The following example assumes you have a Databricks API token and openai installed on your compute. You also need your Databricks workspace instance to connect the OpenAI client to Databricks.

Python

import os
import openai
from openai import OpenAI

client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.embeddings.create(
model="cohere-embeddings-endpoint",
input="what is databricks"
)

The following is the expected request format for an embeddings model. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.

Bash

{
"input": [
"embedding text"
]
}

The following is the expected response format:

JSON
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": []
}
],
"model": "text-embedding-ada-002-v2",
"usage": {
"prompt_tokens": 2,
"total_tokens": 2
}
}

Supported models

See Foundation model types for supported embedding models.

Check whether embeddings are normalized

Use the following to check if the embeddings generated by your model are normalized.

Python

import numpy as np

def is_normalized(vector: list[float], tol=1e-3) -> bool:
magnitude = np.linalg.norm(vector)
return abs(magnitude - 1) < tol

Additional resources