Consulte modelos básicos
Neste artigo, o senhor aprenderá a formatar solicitações de consulta para modelos de fundação e enviá-las para o seu modelo de serviço endpoint. O senhor pode consultar modelos de fundação hospedados pela Databricks e modelos de fundação hospedados fora da Databricks.
Para solicitações de consulta de modelos tradicionais ML ou Python, consulte Ponto de extremidade de serviço de consulta para modelos personalizados.
O Mosaic AI Model Serving é compatível com APIs de modelos de fundação e modelos externos para acessar modelos de fundação. A servindo modelo usa um site unificado compatível com OpenAI API e SDK para consultá-los. Isso possibilita experimentar e personalizar modelos básicos para produção em nuvens e fornecedores compatíveis.
Mosaic AI Model Serving fornece as seguintes opções para o envio de solicitações de pontuação ao endpoint que atende a modelos da fundação ou modelos externos:
Método | Detalhes |
---|---|
Cliente OpenAI | Consultar um modelo hospedado por um Mosaic AI Model Serving endpoint usando o cliente OpenAI. Especifique o nome do modelo de serviço endpoint como a entrada |
Função SQL | Invoque a inferência do modelo diretamente do SQL usando a função |
UI de serviço | Selecione Query endpoint (Ponto de extremidade de consulta ) na página Serving endpoint (Ponto de extremidade de atendimento ). Insira os dados de entrada do modelo no formato JSON e clique em Send Request (Enviar solicitação ). Se o modelo tiver um registro de exemplo de entrada, use Show Example para carregá-lo. |
API REST | Chamar e consultar o modelo usando a API REST. Consulte POST /serving-endpoint/{name}/invocations para obter detalhes. Para solicitações de pontuação para o endpoint que atende a vários modelos, consulte Consultar modelos individuais em endpoint. |
SDK de implantações do MLflow | Use a função predict() do MLflow Deployments SDK para consultar o modelo. |
SDK Python da Databricks | O Databricks Python SDK é uma camada sobre a API REST. Ele lida com detalhes de baixo nível, como autenticação, facilitando a interação com os modelos. |
Requisitos
-
Um Databricks workspace em uma região com suporte.
-
Para enviar uma solicitação de pontuação por meio do cliente OpenAI, REST API ou MLflow Deployment SDK, o senhor deve ter um token Databricks API .
Como prática recomendada de segurança para cenários de produção, a Databricks recomenda que o senhor use tokens OAuth máquina a máquina para autenticação durante a produção.
Para testes e desenvolvimento, o site Databricks recomenda o uso de tokens de acesso pessoal pertencentes à entidade de serviço em vez de usuários do site workspace. Para criar tokens o site para uma entidade de serviço, consulte gerenciar tokens para uma entidade de serviço.
Instalar o pacote
Depois de selecionar um método de consulta, o senhor deve primeiro instalar o pacote apropriado para o seu clustering.
- OpenAI client
- REST API
- MLflow Deployments SDK
- Databricks Python SDK
To use the OpenAI client, the databricks-sdk[openai]
package needs to be installed on your cluster. Databricks SDK provides a wrapper for constructing the OpenAI client with authorization automatically configured to query generative AI models. Run the following in your notebook or your local terminal:
!pip install databricks-sdk[openai]>=0.35.0
The following is only required when installing the package on a Databricks Notebook
dbutils.library.restartPython()
Access to the Serving REST API is available in Databricks Runtime for Machine Learning.
!pip install mlflow
The following is only required when installing the package on a Databricks Notebook
dbutils.library.restartPython()
The Databricks SDK for Python is already installed on all Databricks clusters that use Databricks Runtime 13.3 LTS or above. For Databricks clusters that use Databricks Runtime 12.2 LTS and below, you must install the Databricks SDK for Python first. See Databricks SDK for Python.
Consulte um modelo de conclusão de bate-papo
Veja a seguir exemplos de como consultar um modelo de bate-papo. O exemplo se aplica à consulta de um modelo de bate-papo disponibilizado usando qualquer um dos recursos de modelo de serviço: Foundation Model APIs ou External models.
Para obter um exemplo de inferência de lotes, consulte Realizar inferência de lotes LLM usando AI Functions.
- OpenAI client
- REST API
- MLflow Deployments SDK
- Databricks Python SDK
- LangChain
- SQL
The following is a chat request for the DBRX Instruct model made available by the Foundation Model APIs pay-per-token endpoint, databricks-dbrx-instruct
in your workspace.
To use the OpenAI client, specify the model serving endpoint name as the model
input.
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()
response = openai_client.chat.completions.create(
model="databricks-dbrx-instruct",
messages=[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
max_tokens=256
)
To query foundation models outside of your workspace, you must use the OpenAI client directly. You also need your Databricks workspace instance to connect the OpenAI client to Databricks. The following example assumes you have a Databricks API token and openai
installed on your compute.
import os
import openai
from openai import OpenAI
client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)
response = client.chat.completions.create(
model="databricks-dbrx-instruct",
messages=[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
max_tokens=256
)
The following example uses REST API parameters for querying serving endpoints that serve foundation models. These parameters are Public Preview and the definition might change. See POST /serving-endpoints/{name}/invocations.
The following is a chat request for the DBRX Instruct model made available by the Foundation Model APIs pay-per-token endpoint, databricks-dbrx-instruct
in your workspace.
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": " What is a mixture of experts model?"
}
]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-dbrx-instruct/invocations \
The following example uses the predict()
API from the MLflow Deployments SDK.
The following is a chat request for the DBRX Instruct model made available by the Foundation Model APIs pay-per-token endpoint, databricks-dbrx-instruct
in your workspace.
import mlflow.deployments
# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"
client = mlflow.deployments.get_deploy_client("databricks")
chat_response = client.predict(
endpoint="databricks-dbrx-instruct",
inputs={
"messages": [
{
"role": "user",
"content": "Hello!"
},
{
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
{
"role": "user",
"content": "What is a mixture of experts model??"
}
],
"temperature": 0.1,
"max_tokens": 20
}
)
The following is a chat request for the DBRX Instruct model made available by the Foundation Model APIs pay-per-token endpoint, databricks-dbrx-instruct
in your workspace.
This code must be run in a notebook in your workspace. See Use the Databricks SDK for Python from a Databricks notebook.
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole
w = WorkspaceClient()
response = w.serving_endpoints.query(
name="databricks-dbrx-instruct",
messages=[
ChatMessage(
role=ChatMessageRole.SYSTEM, content="You are a helpful assistant."
),
ChatMessage(
role=ChatMessageRole.USER, content="What is a mixture of experts model?"
),
],
max_tokens=128,
)
print(f"RESPONSE:\n{response.choices[0].message.content}")
To query a foundation model endpoint using LangChain, you can use the ChatDatabricks ChatModel class and specify the endpoint
.
The following example uses the ChatDatabricks
ChatModel class in LangChain to query the Foundation Model APIs pay-per-token endpoint, databricks-dbrx-instruct
.
%pip install databricks-langchain
from langchain_core.messages import HumanMessage, SystemMessage
from databricks_langchain import ChatDatabricks
messages = [
SystemMessage(content="You're a helpful assistant"),
HumanMessage(content="What is a mixture of experts model?"),
]
llm = ChatDatabricks(endpoint_name="databricks-dbrx-instruct")
llm.invoke(messages)
The following example uses the built-in SQL function, ai_query. This function is Public Preview and the definition might change.
The following is a chat request for meta-llama-3-1-70b-instruct
made available by the Foundation Model APIs pay-per-token endpoint, databricks-meta-llama-3-1-70b-instruct
in your workspace.
The ai_query()
function does not support query endpoints that serve the DBRX or the DBRX Instruct model.
SELECT ai_query(
"databricks-meta-llama-3-1-70b-instruct",
"Can you explain AI in ten words?"
)
Como exemplo, o formato de solicitação esperado para um modelo de chat ao usar a API REST é o seguinte. Para modelos externos, o senhor pode incluir parâmetros adicionais que são válidos para um determinado provedor e configuração de endpoint. Consulte Parâmetros de consulta adicionais.
{
"messages": [
{
"role": "user",
"content": "What is a mixture of experts model?"
}
],
"max_tokens": 100,
"temperature": 0.1
}
O formato de resposta esperado para uma solicitação feita usando a API REST é o seguinte:
{
"model": "databricks-dbrx-instruct",
"choices": [
{
"message": {},
"index": 0,
"finish_reason": null
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 74,
"total_tokens": 81
},
"object": "chat.completion",
"id": null,
"created": 1698824353
}
Consulte um modelo de incorporação
A seguir, uma solicitação de incorporação para o modelo gte-large-en
disponibilizado pelas APIs do Foundation Model. O exemplo se aplica à consulta de um modelo de incorporação disponibilizado usando um dos recursos de modelo de serviço: Foundation Model APIs ou modelos externos.
- OpenAI client
- REST API
- MLflow Deployments SDK
- Databricks Python SDK
- LangChain
- SQL
To use the OpenAI client, specify the model serving endpoint name as the model
input.
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()
response = openai_client.embeddings.create(
model="databricks-gte-large-en",
input="what is databricks"
)
To query foundation models outside your workspace, you must use the OpenAI client directly, as demonstrated below. The following example assumes you have a Databricks API token and openai installed on your compute. You also need your Databricks workspace instance to connect the OpenAI client to Databricks.
import os
import openai
from openai import OpenAI
client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)
response = client.embeddings.create(
model="databricks-gte-large-en",
input="what is databricks"
)
The following example uses REST API parameters for querying serving endpoints that serve foundation models or external models. These parameters are Public Preview and the definition might change. See POST /serving-endpoints/{name}/invocations.
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{ "input": "Embed this sentence!"}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-gte-large-en/invocations
The following example uses the predict()
API from the MLflow Deployments SDK.
import mlflow.deployments
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"
client = mlflow.deployments.get_deploy_client("databricks")
embeddings_response = client.predict(
endpoint="databricks-gte-large-en",
inputs={
"input": "Here is some text to embed"
}
)
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole
w = WorkspaceClient()
response = w.serving_endpoints.query(
name="databricks-gte-large-en",
input="Embed this sentence!"
)
print(response.data[0].embedding)
To use a Databricks Foundation Model APIs model in LangChain as an embedding model, import the DatabricksEmbeddings
class and specify the endpoint
parameter as follows:
%pip install databricks-langchain
from databricks_langchain import DatabricksEmbeddings
embeddings = DatabricksEmbeddings(endpoint="databricks-gte-large-en")
embeddings.embed_query("Can you explain AI in ten words?")
The following example uses the built-in SQL function, ai_query. This function is Public Preview and the definition might change.
SELECT ai_query(
"databricks-gte-large-en",
"Can you explain AI in ten words?"
)
A seguir está o formato de solicitação esperado para um modelo de incorporação. Para modelos externos, o senhor pode incluir parâmetros adicionais que são válidos para um determinado provedor e configuração de endpoint. Consulte Parâmetros de consulta adicionais.
{
"input": [
"embedding text"
]
}
Veja a seguir o formato de resposta esperado:
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": []
}
],
"model": "text-embedding-ada-002-v2",
"usage": {
"prompt_tokens": 2,
"total_tokens": 2
}
}
Verifique se as incorporações estão normalizadas
Use o seguinte para verificar se as incorporações geradas pelo seu modelo estão normalizadas.
import numpy as np
def is_normalized(vector: list[float], tol=1e-3) -> bool:
magnitude = np.linalg.norm(vector)
return abs(magnitude - 1) < tol
Consulte um modelo de preenchimento de texto
- OpenAI client
- REST API
- MLflow Deployments SDK
- Databricks Python SDK
- SQL
Querying text completion models made available using Foundation Model APIs pay-per-token using the OpenAI client is not supported. Only querying external models using the OpenAI client is supported as demonstrated in this section.
To use the OpenAI client, specify the model serving endpoint name as the model
input. The following example queries the claude-2
completions model hosted by Anthropic using the OpenAI client. To use the OpenAI client, populate the model
field with the name of the model serving endpoint that hosts the model you want to query.
This example uses a previously created endpoint, anthropic-completions-endpoint
, configured for accessing external models from the Anthropic model provider. See how to create external model endpoints.
See Supported models for additional models you can query and their providers.
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()
completion = openai_client.completions.create(
model="anthropic-completions-endpoint",
prompt="what is databricks",
temperature=1.0
)
print(completion)
The following is a completions request for querying a completions model made available using external models.
The following example uses REST API parameters for querying serving endpoints that serve external models. These parameters are Public Preview and the definition might change. See POST /serving-endpoints/{name}/invocations.
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{"prompt": "What is a quoll?", "max_tokens": 64}' \
https://<workspace_host>.databricks.com/serving-endpoints/<completions-model-endpoint>/invocations
The following is a completions request for querying a completions model made available using external models.
The following example uses the predict()
API from the MLflow Deployments SDK.
import os
import mlflow.deployments
# Only required when running this example outside of a Databricks Notebook
os.environ['DATABRICKS_HOST'] = "https://<workspace_host>.databricks.com"
os.environ['DATABRICKS_TOKEN'] = "dapi-your-databricks-token"
client = mlflow.deployments.get_deploy_client("databricks")
completions_response = client.predict(
endpoint="<completions-model-endpoint>",
inputs={
"prompt": "What is the capital of France?",
"temperature": 0.1,
"max_tokens": 10,
"n": 2
}
)
# Print the response
print(completions_response)
TThe following is a completions request for querying a completions model made available using external models.
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole
w = WorkspaceClient()
response = w.serving_endpoints.query(
name="<completions-model-endpoint>",
prompt="Write 3 reasons why you should train an AI model on domain specific data sets."
)
print(response.choices[0].text)
The following example uses the built-in SQL function, ai_query. This function is Public Preview and the definition might change.
SELECT ai_query(
"<completions-model-endpoint>",
"Can you explain AI in ten words?"
)
A seguir está o formato de solicitação esperado para um modelo de conclusão. Para modelos externos, o senhor pode incluir parâmetros adicionais que são válidos para um determinado provedor e configuração de endpoint. Consulte Parâmetros de consulta adicionais.
{
"prompt": "What is mlflow?",
"max_tokens": 100,
"temperature": 0.1,
"stop": [
"Human:"
],
"n": 1,
"stream": false,
"extra_params":
{
"top_p": 0.9
}
}
Veja a seguir o formato de resposta esperado:
{
"id": "cmpl-8FwDGc22M13XMnRuessZ15dG622BH",
"object": "text_completion",
"created": 1698809382,
"model": "gpt-3.5-turbo-instruct",
"choices": [
{
"text": "MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for tracking experiments, managing and deploying models, and collaborating on projects. MLflow also supports various machine learning frameworks and languages, making it easier to work with different tools and environments. It is designed to help data scientists and machine learning engineers streamline their workflows and improve the reproducibility and scalability of their models.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 83,
"total_tokens": 88
}
}
Converse com LLMs apoiados usando o AI Playground
O senhor pode interagir com grandes modelos de linguagem suportados usando o AI Playground. O AI Playground é um ambiente semelhante a um bate-papo em que o senhor pode testar, solicitar e comparar LLMs do seu Databricks workspace.
Recurso adicional
- Monitorar modelos atendidos usando tabelas de inferência habilitadas para AI Gateway
- Realize a inferência de lotes LLM usando AI Functions
- APIs do Foundation Model do Databricks
- Modelos externos no Mosaic AI Model Serving
- Tutorial: Crie pontos de extremidade de modelo externo para consultar modelos do OpenAI
- Modelos compatíveis com pagamento por token
- Referência da API REST do Foundation Model