Foundation model REST API reference
This article provides general API information for Databricks Foundation Model APIs and the models they support. The Foundation Model APIs are designed to be similar to OpenAI's REST API to make migrating existing projects easier. Both the pay-per-token and provisioned throughput endpoints accept the same REST API request format.
Endpoints
Foundation Model APIs supports pay-per-token endpoints and provisioned throughput endpoints.
A preconfigured endpoint is available in your workspace for each pay-per-token supported model, and users can interact with these endpoints using HTTP POST requests. See Supported foundation models on Mosaic AI Model Serving for supported models.
Provisioned throughput endpoints can be created using the API or the Serving UI. These endpoints support multiple models per endpoint for A/B testing, as long as both served models expose the same API format. For example, both models are chat models. See POST /api/2.0/serving-endpoints for endpoint configuration parameters.
Requests and responses use JSON, the exact JSON structure depends on an endpoint's task type. Chat and completion endpoints support streaming responses.
Usage
Responses include a usage sub-message which reports the number of tokens in the request and response. The format of this sub-message is the same across all task types.
Field | Type | Description |
|---|---|---|
| Integer | Number of generated tokens. Not included in embedding responses. |
| Integer | Number of tokens from the input prompt(s). |
| Integer | Number of total tokens. |
| Integer | Number of the thinking tokens. It is only applicable to reasoning models. |
For models like databricks-meta-llama-3-3-70b-instruct a user prompt is transformed using a prompt template before being passed into the model. For pay-per-token endpoints, a system prompt might also be added. prompt_tokens includes all text added by our server.
Responses API
The Responses API is only compatible with OpenAI models.
The Responses API enables multi-turn conversations with a model. Unlike Chat Completions, the Responses API uses input instead of messages.
Responses API request
Field | Default | Type | Description |
|---|---|---|---|
| String | Required. Model ID used to generate the response. | |
| String or List[ResponsesInput] | Required. Text, image, or file inputs to the model, used to generate a response. Unlike | |
|
| String | A system (or developer) message inserted into the model's context. |
|
|
| An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens. |
|
| Float in [0,2] | The sampling temperature. 0 is deterministic and higher values introduce more randomness. |
|
| Float in (0,1] | The probability threshold used for nucleus sampling. |
|
| Boolean | If set to true, the model response data will be streamed to the client as it is generated using server-sent events. |
|
| Options for streaming responses. Only set this when you set | |
|
| Configuration options for a text response from the model. Can be plain text or structured JSON data. | |
|
| Reasoning configuration for gpt-5 and o-series models. | |
|
| String or ToolChoiceObject | How the model should select which tool (or tools) to use when generating a response. See the |
|
| List[ToolObject] | An array of tools the model may call while generating a response. Note: Code interpreter and web search tools are not supported by Databricks. |
|
| Boolean | Whether to allow the model to run tool calls in parallel. |
|
| Integer greater than zero | The maximum number of total calls to built-in tools that can be processed in a response. |
|
| Object | Set of 16 key-value pairs that can be attached to an object. |
|
| String | Used to cache responses for similar requests to optimize cache hit rates. Replaces the |
|
| String | The retention policy for the prompt cache. Set to |
|
| String | A stable identifier used to help detect users of your application that may be violating usage policies. |
|
| String | Deprecated. Use |
|
| String | The truncation strategy to use for the model response. |
|
| Integer | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. |
|
| List[String] | Specify additional output data to include in the model response. |
|
| Object | Reference to a prompt template and its variables. |
Unsupported parameters: The following parameters are not supported by Databricks and will return a 400 error if specified:
background- Background processing is not supportedstore- Stored responses is not supportedconversation- Conversation API is not supportedservice_tier- Service tier selection is managed by Databricks
ResponsesInput
The input field accepts either a string or a list of input message objects with role and content.
Field | Type | Description |
|---|---|---|
| String | Required. The role of the message author. Can be |
| String or List[ResponsesContentBlock] | Required. The content of the message, either as a string or array of content blocks. |
ResponsesContentBlock
Content blocks define the type of content in input and output messages. The content block type is determined by the type field.
InputText
Field | Type | Description |
|---|---|---|
| String | Required. Must be |
| String | Required. The text content. |
OutputText
Field | Type | Description |
|---|---|---|
| String | Required. Must be |
| String | Required. The text content. |
| List[Object] | Optional annotations for the text content. |
InputImage
Field | Type | Description |
|---|---|---|
| String | Required. Must be |
| String | Required. URL or base64-encoded data URI of the image. |
InputFile
Field | Type | Description |
|---|---|---|
| String | Required. Must be |
| String | File identifier if using uploaded files. |
| String | The name of the file. |
| String | Base64-encoded data URI with format prefix. For example, PDF files use format |
FunctionCall
Field | Type | Description |
|---|---|---|
| String | Required. Must be |
| String | Required. Unique identifier for the function call. |
| String | Required. The call identifier. |
| String | Required. The name of the function being called. |
| Object/String | Required. The function arguments as JSON object or string. |
FunctionCallOutput
Field | Type | Description |
|---|---|---|
| String | Required. Must be |
| String | Required. The call identifier this output corresponds to. |
| String/Object | Required. The function output as string or JSON object. |
StreamOptions
Configuration for streaming responses. Only used when stream: true.
Field | Type | Description |
|---|---|---|
| Boolean | If true, include token usage information in the stream. Default is |
TextConfig
Configuration for text output, including structured outputs.
Field | Type | Description |
|---|---|---|
| The format specification for the text output. |
ResponsesFormatObject
Specifies the output format for text responses.
Field | Type | Description |
|---|---|---|
| String | Required. The type of format: |
| Object | Required when |
The json_schema object has the same structure as JsonSchemaObject documented in the Chat Completions API.
ReasoningConfig
Configuration for reasoning behavior in reasoning models (o-series and gpt-5 models).
Field | Type | Description |
|---|---|---|
| String | The reasoning effort level: |
| String | Encrypted reasoning content for stateless mode. Provided by the model in previous responses. |
ToolObject
See Function calling on Databricks.
Field | Type | Description |
|---|---|---|
| String | Required. The type of the tool. Currently, only |
| Required. The function definition associated with the tool. |
FunctionObject
Field | Type | Description |
|---|---|---|
| String | Required. The name of the function to be called. |
| Object | Required. The detailed description of the function. The model uses this description to understand the relevance of the function to the prompt and generate the tool calls with higher accuracy. |
| Object | The parameters the function accepts, described as a valid JSON schema object. If the tool is called, then the tool call is fit to the JSON schema provided. Omitting parameters defines a function without any parameters. The number of |
| Boolean | Whether to enable strict schema adherence when generating the function call. If set to |
ToolChoiceObject
See Function calling on Databricks.
Field | Type | Description |
|---|---|---|
| String | Required. The type of the tool. Currently, only |
| Object | Required. An object defining which tool to call of the form |
Responses API response
For non-streaming requests, the response is a single response object. For streaming requests, the response is a text/event-stream where each event is a response chunk.
Field | Type | Description |
|---|---|---|
| String | Unique identifier for the response. Note: Databricks encrypts this ID for security. |
| String | The object type. Equal to |
| Integer | The Unix timestamp (in seconds) when the response was created. |
| String | The status of the response. One of: |
| String | The model version used to generate the response. |
| List[ResponsesMessage] | The output generated by the model, typically containing message objects. |
| Token usage metadata. | |
| Error information if the response failed. | |
| Details about why the response is incomplete, if applicable. | |
| String | The instructions provided in the request. |
| Integer | The maximum output tokens specified in the request. |
| Float | The temperature used for generation. |
| Float | The top_p value used for generation. |
| List[ToolObject] | The tools specified in the request. |
| String or ToolChoiceObject | The tool_choice setting from the request. |
| Boolean | Whether parallel tool calls were enabled. |
| Boolean | Whether the response was stored. |
| Object | The metadata attached to the response. |
ResponsesMessage
Message objects in the output field containing the model's response content.
Field | Type | Description |
|---|---|---|
| String | Required. Unique identifier for the message. |
| String | Required. The role of the message. Either |
| List[ResponsesContentBlock] | Required. The content blocks in the message. |
| String | The status of the message processing. |
| String | Required. The object type. Equal to |
Error
Error information when a response fails.
Field | Type | Description |
|---|---|---|
| String | Required. The error code. |
| String | Required. A human-readable error message. |
| String | The parameter that caused the error, if applicable. |
| String | Required. The error type. |
IncompleteDetails
Details about why a response is incomplete.
Field | Type | Description |
|---|---|---|
| String | Required. The reason the response is incomplete. |
Chat Completions API
The Chat Completions API enables multi-turn conversations with a model. The model response provides the next assistant message in the conversation. See POST /serving-endpoints/{name}/invocations for querying endpoint parameters.
Chat request
Field | Default | Type | Description |
|---|---|---|---|
| ChatMessage list | Required. A list of messages representing the current conversation. | |
|
|
| The maximum number of tokens to generate. |
|
| Boolean | Stream responses back to a client in order to allow partial results for requests. If this parameter is included in the request, responses are sent using the Server-sent events standard. |
|
| Float in [0,2] | The sampling temperature. 0 is deterministic and higher values introduce more randomness. |
|
| Float in (0,1] | The probability threshold used for nucleus sampling. |
|
|
| Defines the number of k most likely tokens to use for top-k-filtering. Set this value to 1 to make outputs deterministic. |
| [] | String or List[String] | Model stops generating further tokens when any one of the sequences in |
| 1 | Integer greater than zero | The API returns |
|
| String or ToolChoiceObject | Used only in conjunction with the |
|
| A list of | |
|
| An object specifying the format that the model must output. Accepted types are Setting to Setting to | |
|
| Boolean | This parameter indicates whether to provide the log probability of a token being sampled. |
|
| Integer | This parameter controls the number of most likely token candidates to return log probabilities for at each sampling step. Can be 0-20. |
|
| String | Controls the level of reasoning effort the model should apply when generating responses. Accepted values are |
ChatMessage
Field | Type | Description |
|---|---|---|
| String | Required. The role of the author of the message. Can be |
| String or List[ContentItem] | Required for chat tasks that do not involve tool calls. The content can be either a string or an array that contains a series of multimodal elements in a single chat interaction. These elements follow the sequence in which they are processed as inputs or outputs by the models. This array input is specifically designed for use with proprietary models accessible only through external model providers. Currently, only Claude models are supported. Use string-typed content for other external model providers, open source models (Llama), or models hosted by customers on Databricks. |
| ToolCall list | The list of |
| String | When |
The system role can only be used once, as the first message in a conversation. It overrides the model's default system prompt.
ContentItem
ContentItem is one of the following content types: TextContent, ReasoningContent, DocumentContent, or ImageContent
TextContent
Field | Type | Description |
|---|---|---|
| String | Required. Must be text. |
| String | Required text content. |
| List[Citation] | Optional citation information. See table below. |
| String | Enables caching for your request. This parameter is only accepted by Databricks-hosted Claude models. See Prompt caching for an example. |
The citations fields are as follows:
Field | Type | Description |
|---|---|---|
| String | Required. Must be |
| String | The text cited from the document. |
| Integer | The index of the cited document. |
| String | The title of the cited document. |
| Integer | The starting index of the cited text in the document. |
| Integer | The ending index of the cited text in the document. |
ImageContent
Field | Type | Description |
|---|---|---|
| String | Required. Must be an |
| ImageURL | Equivalent to the OpenAI image_url object. |
| String | Enables caching for your request. This parameter is only accepted by Databricks-hosted Claude model. Image message content must use the encoded data as its source. URLs are not currently supported. See Prompt caching for an example. |
ImageURL fields are below:
Field | Type | Description |
|---|---|---|
| String | Base64-encoded image data. Must be a valid base64 string generated from a supported image file format (JPEG, PNG, GIF, WebP, etc.). |
| String | Specifies the detail level of the image. |
ReasoningContent
Field | Type | Description |
|---|---|---|
| String | Required. Must be an |
| List[Summary] | Reasoning text contents. Summary can be either |
| String | Enables caching for your request. This parameter is only accepted by Databricks hosted Claude models. See Prompt caching for an example. |
TextSummary
Field | Type | Description |
|---|---|---|
| String | Required. Must be an |
| String | A short summary of the reasoning used by the model when generating the response. |
| String | Optional cryptographic tokens to verify the authenticity of the data. |
EncryptedTextSummary
Field | Type | Description |
|---|---|---|
| String | Required. Must be a |
| String | Encrypted text content which is not human-readable due to safety reasons. |
DocumentContent
DocumentContent is only for requests.
Field | Type | Description |
|---|---|---|
| String | Required. Must be |
| String | Title for the document. |
| String | Description of the document. |
| Source | Required. Specifies more information about the document, including format and contents. |
| Map[string, bool] | Map with single field “enabled” that maps to a bool indicating whether to enable citations for the document. |
Source
Field | Type | Description |
|---|---|---|
| String | Required. Must be one of |
| String | Required for PDF and text type.
|
| String | Required for PDF and text. The data containing the document source. |
| String or List[TextContent] or List[ImageContent] | Required for |
| String | Required for URLPDFSource type. The URL of the PDF document. |
FileContent
Field | Type | Description |
|---|---|---|
| String | Required. Must be file. |
| File | Required File content. |
File fields are below:
Field | Type | Description |
|---|---|---|
| String | The name of the file. |
| String | Required OpenAI compatible base64 encoded file data. It starts with the file format follwed by the base64 encoded data. For example, a PDF file has format in |
ToolCall
A tool call action suggestion by the model. See Function calling on Databricks.
Field | Type | Description |
|---|---|---|
| String | Required. A unique identifier for this tool call suggestion. |
| String | Required. Only |
| Required. A function call suggested by the model. | |
| String | Enables caching for your request. This parameter is only accepted by Databricks-hosted Claude models. See Prompt caching for an example. |
FunctionCallCompletion
Field | Type | Description |
|---|---|---|
| String | Required. The name of the function the model recommended. |
| Object | Required. Arguments to the function as a serialized JSON dictionary. |
Note: ToolChoiceObject, ToolObject, and FunctionObject are defined in the Responses API section and are shared between both APIs.
ResponseFormatObject
See Structured outputs on Databricks.
Field | Type | Description |
|---|---|---|
| String | Required. The type of response format being defined. Either |
| Required. The JSON schema to adhere to if |
JsonSchemaObject
See Structured outputs on Databricks.
Field | Type | Description |
|---|---|---|
| String | Required. The name of the response format. |
| String | A description of what the response format is for, used by the model to determine how to respond in the format. |
| Object | Required. The schema for the response format, described as a JSON schema object. |
| Boolean | Whether to enable strict schema adherence when generating the output. If set to |
Chat response
For non-streaming requests, the response is a single chat completion object. For streaming requests, the response is a text/event-stream where each event is a completion chunk object. The top-level structure of completion and chunk objects is almost identical: only choices has a different type.
Field | Type | Description |
|---|---|---|
| String | Unique identifier for the chat completion. |
| List[ChatCompletionChoice] or List[ChatCompletionChunk] (streaming) | List of chat completion texts. |
| String | The object type. Equal to either |
| Integer | The time the chat completion was generated in seconds. |
| String | The model version used to generate the response. |
| Token usage metadata. Might not be present on streaming responses. |
ChatCompletionChoice
Field | Type | Description |
|---|---|---|
| Integer | The index of the choice in the list of generated choices. |
| A chat completion message returned by the model. The role will be | |
| String | The reason the model stopped generating tokens. |
| String | When using proprietary models from external model providers, the provider's APIs might include additional metadata in responses. Databricks filters these responses and returns only a subset of the provider's original fields. The |
ChatCompletionChunk
Field | Type | Description |
|---|---|---|
| Integer | The index of the choice in the list of generated choices. |
| A chat completion message part of generated streamed responses from the model. Only the first chunk is guaranteed to have | |
| String | The reason the model stopped generating tokens. Only the last chunk will have this populated. |
Embeddings API
Embedding tasks map input strings into embedding vectors. Many inputs can be batched together in each request. See POST /serving-endpoints/{name}/invocations for querying endpoint parameters.
Embedding request
Field | Type | Description |
|---|---|---|
| String or List[String] | Required. The input text to embed. Can be a string or a list of strings. |
| String | An optional instruction to pass to the embedding model. |
Instructions are optional and highly model specific. For instance the BGE authors recommend no instruction when indexing chunks and recommend using the instruction "Represent this sentence for searching relevant passages:" for retrieval queries. Other models like Instructor-XL support a wide range of instruction strings.
Embeddings response
Field | Type | Description |
|---|---|---|
| String | Unique identifier for the embedding. |
| String | The object type. Equal to |
| String | The name of the embedding model used to create the embedding. |
| The embedding object. | |
| Token usage metadata. |
EmbeddingObject
Field | Type | Description |
|---|---|---|
| String | The object type. Equal to |
| Integer | The index of the embedding in the list of embeddings generated by the model. |
| List[Float] | The embedding vector. Each model will return a fixed size vector (1024 for BGE-Large) |
Completions API
Text completion tasks are for generating responses to a single prompt. Unlike Chat, this task supports batched inputs: multiple independent prompts can be sent in one request. See POST /serving-endpoints/{name}/invocations for querying endpoint parameters.
Completion request
Field | Default | Type | Description |
|---|---|---|---|
| String or List[String] | Required. The prompts for the model. | |
|
|
| The maximum number of tokens to generate. |
|
| Boolean | Stream responses back to a client in order to allow partial results for requests. If this parameter is included in the request, responses are sent using the Server-sent events standard. |
|
| Float in [0,2] | The sampling temperature. 0 is deterministic and higher values introduce more randomness. |
|
| Float in (0,1] | The probability threshold used for nucleus sampling. |
|
|
| Defines the number of k most likely tokens to use for top-k-filtering. Set this value to 1 to make outputs deterministic. |
|
|
| For timeouts and context-length-exceeded errors. One of: |
| 1 | Integer greater than zero | The API returns |
| [] | String or List[String] | Model stops generating further tokens when any one of the sequences in |
|
| String | A string that is appended to the end of every completion. |
|
| Boolean | Returns the prompt along with the completion. |
|
| Boolean | If |
Completion response
Field | Type | Description |
|---|---|---|
| String | Unique identifier for the text completion. |
| A list of text completions. For every prompt passed in, | |
| String | The object type. Equal to |
| Integer | The time the completion was generated in seconds. |
| Token usage metadata. |
CompletionChoice
Field | Type | Description |
|---|---|---|
| Integer | The index of the prompt in request. |
| String | The generated completion. |
| String | The reason the model stopped generating tokens. |