How to query Foundation Model APIs with the Python SDK
Important
The Foundation Model APIs Python SDK is an Experimental feature and the API definition may change.
This article provides guidance on how to query Databricks Foundation Model APIs with the Python SDK. It includes installation instructions and query and response formats.
The Python SDK is a layer on top of the REST API. It handles low-level details, such as authentication and mapping model IDs to endpoint URLs, making it easier to interact with the models. The SDK is designed to be used from inside Databricks notebooks.
Requirements
See the Requirements section.
Install the Foundation Model APIs Python SDK
You can install the SDK on a cluster attached to a Databricks Notebook or to your local environment. After the SDK is installed, you can use it to query models. See the examples below:
Install the SDK on a Databricks Notebook
You can install the SDK on your cluster attached to your Databricks Notebook. The following command can be run in your notebook:
!pip install databricks-genai-inference
dbutils.library.restartPython()
Install the SDK on your local environment
If you are working outside of a Databricks Notebook, you can install the SDK in your local environment. The following command can be run in your terminal:
pip install databricks-genai-inference
Databricks native authentication is required to use this SDK.
To do this, you need to first generate a personal access token for your application, then set the two environment variables DATABRICKS_HOST
and DATABRICKS_TOKEN
.
In the following command, DATABRICKS_HOST
represents the Databricks host URL for your workspace. This URL typically starts with "https://"
and includes the workspace instance name. DATABRICKS_TOKEN
represents your Databricks personal access token value that you generated.
export DATABRICKS_HOST=<YOUR HOST NAME>
export DATABRICKS_TOKEN=<YOUR TOKEN>
Query a chat completion model
To query the llama-2-70b-chat
completions model, use ChatCompletion.create()
to execute a model query. The ChatCompletion.create()
function accepts the same arguments as the Chat task request API.
from databricks_genai_inference import ChatCompletion
response = ChatCompletion.create(model="llama-2-70b-chat",
messages=[{"role": "system", "content": "You are a helpful assistant."},
{"role": "user","content": "Knock knock."}],
max_tokens=128)
print(f"response.message:{response.message}")
By default, create()
returns a single response object (ChatCompletionObject
) after the complete response has been generated. This can easily take >5 seconds for a large model like llama-2-70b-chat
.
For a more responsive experience, you can choose to stream text fragments as they are generated. Pass in stream=True
to enable streaming. As a result , create()
returns a generator which provides a sequence of response fragments (ChatCompletionChunkObject
).
Both kinds of response objects have the same top-level properties:
Property |
Type |
Description |
---|---|---|
|
|
Raw json response (see Chat task API for details) |
|
|
Unique request ID |
|
|
Model name |
|
|
Chat completion |
|
|
Token usage metadata (cumulative for streaming) |
Chat session
ChatSession
is a high level class to manage multi-round chat conversations. It provides the following functions:
Function |
Return |
Description |
---|---|---|
|
Takes a new user message |
|
|
string |
Last message from assistant |
|
list of dict |
Messages in chat history, including roles. |
|
int |
Number of chat rounds conducted so far. |
To initialize ChatSession
, you use the same set of arguments as ChatCompletion
, and those arguments are used throughout the chat session.
from databricks_genai_inference import ChatSession
chat = ChatSession(model="llama-2-70b-chat", system_message="You are a helpful assistant.", max_tokens=128)
chat.reply("Knock, knock!")
chat.last # return "Hello! Who's there?"
chat.reply("Guess who!")
chat.last # return "Okay, I'll play along! Is it a person, a place, or a thing?"
chat.history
# return: [
# {'role': 'system', 'content': 'You are a helpful assistant.'},
# {'role': 'user', 'content': 'Knock, knock.'},
# {'role': 'assistant', 'content': "Hello! Who's there?"},
# {'role': 'user', 'content': 'Guess who!'},
# {'role': 'assistant', 'content': "Okay, I'll play along! Is it a person, a place, or a thing?"}
# ]
Query an embedding model
To query the bge-large-en
embedding model, use Embedding.create()
to execute a model query. The Embedding.create()
function accepts the same arguments as the Embedding task request API.
The following example generates embeddings optimized for indexing.
from databricks_genai_inference import Embedding
response = Embedding.create(
model="bge-large-en",
input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings}')
You can pass multiple inputs into create
by setting input
to a list of strings.
To optimize an embedding for query retrieval in a RAG application, add the parameter instruction = "Represent this sentence for searching relevant passages:"
.
The Embedding.create()
function returns an EmbeddingObject
with the following properties:
Property |
Type |
Description |
---|---|---|
|
dict |
Raw json response (see Embedding task API for details) |
|
string |
Unique request ID |
|
string |
Model name |
|
list of array |
List of embeddings |
|
dict |
Token usage metadata |
Query a text completion model
To query the mpt-7b-instruct
text completions model, use Completion.create()
to execute a query. The Completion.create()
function accepts the same arguments as the Completion task request API.
from databricks_genai_inference import Completion
response = Completion.create(
model="mpt-7b-instruct",
prompt="Write 3 reasons why you should train an AI model on domain specific data sets.",
max_tokens=128)
print(f"response.text:{response.text:}")
Completion
is similar to ChatCompletion
: by default it waits and returns the complete response, but you can use stream=True
to retrieve response fragments as they are generated. In both cases the response objects have the following top-level properties:
Property |
Type |
Description |
---|---|---|
|
dict |
Raw json response (see Completion task API for details) |
|
string |
Unique request ID |
|
string |
Model name |
|
list of string |
List of text completions |
|
dict |
Token usage metadata (cumulative for streaming) |