Databricks Foundation Model APIs supported models
Preview
This feature is in Private Preview. To try it, reach out to your Databricks contact.
Warning
The private preview for Foundation Models API is only supported in certain AWS-US regions. Databricks might process your data outside of the region and cloud provider where your data originated.
This article describes the state-of-the-art open source models that are supported by the Databricks Foundation Model APIs. It also provides example query requests and responses for each model.
To enroll in the Private Preview, please submit the enrollment form.
llama-2-70b-chat
Important
Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. Customers are responsible for ensuring compliance with applicable model licenses.
Llama-2-70B-Chat is a state-of-the-art 70 billion parameter language model with a context length of 4,096 tokens, trained by Meta. It excels at interactive applications that require strong reasoning capabilities, including summarization, question-answering, and chat applications.
Compared to other models with smaller parameter counts, Llama-2 demonstrates the strongest performance out-of-the-box across traditional natural language understanding benchmarks. Similar to other large language models, Llama-2-70B’s output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.
Query example and response
The serving endpoint for this model is databricks-llama-2-70b-chat
. For parameters and syntax, see Chat task.
The following is a query example.
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Hello! What is a fun fact about llamas?"
}
],
"max_tokens": 128
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-llama-2-70b-chat/invocations \
The following is an example response:
{
"id":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"object":"chat.completion",
"created":1698916461,
"model":"llama-2-70b-chat-hf",
"choices":[{"index":0,
"message":{
"role":"assistant",
"content":" Llamas have a unique way of communicating with each other through a series of ear and body postures, as well as a distinctive \"llama language\" that sounds like a mixture of humming and grunting."},
"finish_reason":"stop"}],
"usage":{"prompt_tokens":47,
"completion_tokens":49,
"total_tokens":96}
}
The model
field in the response gives the exact model version being used. The -hf
suffix above is for the full-precision snapshot on Hugging Face.
bge-large-en-v1.5
BAAI General Embedding (BGE) is a text embedding model that can map any text to a 1024-dimension embedding vector. These vectors can be used in vector databases for LLMs, as well as tasks like retrieval, classification, question-answering, clustering, or semantic search. This endpoint serves the English version of the model.
Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. BGE can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.
In RAG applications, you may be able to improve the performance of your retrieval system by including an instruction parameter. The BGE authors recommend trying the instruction "Represent this sentence for searching relevant passages:"
for query embeddings, though its performance impact is domain dependent.
Query example and response
The serving endpoint for this model is databricks-bge-large-en
. For parameters and syntax, see Embedding task.
The following is a query example.
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{ "input": "Embed this sentence!"}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-bge-large-en/invocations
The following is an example response:
{"id":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"object":"list",
"model":"bge-large-en-v1.5",
"data":[{"index":0,
"object":"embedding",
"embedding":[0.0167999267578125,
0.01250457763671875,
// ...
0.000004649162292480469]
}],
"usage":{"prompt_tokens":7,
"total_tokens":7}
}
The model
field in the response gives the exact model version.
mpt-7b-8k-instruct
MPT-7B-8K-Instruct is a 6.7 billion parameter model trained by MosaicML for long-form instruction following, especially question-answering on and summarization of longer documents. The model is pre-trained for 1.5 trillion tokens on a mixture of datasets, and fine-tuned on a dataset derived from the Databricks Dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets The model name you see in the product is mpt-7b-instruct
but the model specifically being used is the newer version of the model.
MPT-7B-8K-Instruct can be used for a variety of tasks such as question-answering, summarization, and extraction. It is very fast relative to Llama-2-70B but might generate lower quality responses. This model supports a context length of 8 thousand tokens. Learn more about the MPT-7B-8k-Instruct model.
Similar to other language models of this size, MPT-7B-8K-Instruct should not be relied on to produce factually accurate information. This model was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
Query example and response
The serving endpoint for this model is databricks-mpt-7b-instruct
. For the parameters and syntax, see Completion task.
The following is a query example.
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{"prompt": "What is a quoll?", "max_tokens": 64}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-mpt-7b-instruct/invocations
The following is an example response:
{"id":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"object": "text_completion",
"model": "mpt-7b-8k-instruct",
"choices": [{"text": " A quoll is a carnivorous mammal native to Australia. The species is sometimes known as a “marsupial predator.”",
"index": 0,
"logprobs": null,
"finish_reason": "stop"}],
"usage": {"prompt_tokens": 35,
"completion_tokens": 29,
"total_tokens": 64}
}
The model
field in the response gives the exact model version.