Foundation model REST API reference

This article provides general API information for Databricks Foundation Model APIs and the models they support. The Foundation Model APIs are designed to be similar to OpenAI's REST API to make migrating existing projects easier. Both the pay-per-token and provisioned throughput endpoints accept the same REST API request format.

Endpoints

Foundation Model APIs supports pay-per-token endpoints and provisioned throughput endpoints.

A preconfigured endpoint is available in your workspace for each pay-per-token supported model, and users can interact with these endpoints using HTTP POST requests. See Supported foundation models on Mosaic AI Model Serving for supported models.

Provisioned throughput endpoints can be created using the API or the Serving UI. These endpoints support multiple models per endpoint for A/B testing, as long as both served models expose the same API format. For example, both models are chat models. See POST /api/2.0/serving-endpoints for endpoint configuration parameters.

Requests and responses use JSON, the exact JSON structure depends on an endpoint's task type. Chat and completion endpoints support streaming responses.

Usage

Responses include a usage sub-message which reports the number of tokens in the request and response. The format of this sub-message is the same across all task types.

Field	Type	Description
`completion_tokens`	Integer	Number of generated tokens. Not included in embedding responses.
`prompt_tokens`	Integer	Number of tokens from the input prompt(s).
`total_tokens`	Integer	Number of total tokens.
`reasoning_tokens`	Integer	Number of the thinking tokens. It is only applicable to reasoning models.

For models like databricks-meta-llama-3-3-70b-instruct a user prompt is transformed using a prompt template before being passed into the model. For pay-per-token endpoints, a system prompt might also be added. prompt_tokens includes all text added by our server.

Responses API

important

The Responses API is only compatible with OpenAI models.

The Responses API enables multi-turn conversations with a model. Unlike Chat Completions, the Responses API uses input instead of messages.

Responses API request

Field	Default	Type	Description
`model`		String	Required. Model ID used to generate the response.
`input`		String or List[ResponsesInput]	Required. Text, image, or file inputs to the model, used to generate a response. Unlike `messages`, this field uses `input` to specify conversation content.
`instructions`	`null`	String	A system (or developer) message inserted into the model's context.
`max_output_tokens`	`null`	`null`, which means no limit, or an integer greater than zero	An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
`temperature`	`1.0`	Float in [0,2]	The sampling temperature. 0 is deterministic and higher values introduce more randomness.
`top_p`	`1.0`	Float in (0,1]	The probability threshold used for nucleus sampling.
`stream`	`false`	Boolean	If set to true, the model response data will be streamed to the client as it is generated using server-sent events.
`stream_options`	`null`	StreamOptions	Options for streaming responses. Only set this when you set `stream: true`.
`text`	`null`	TextConfig	Configuration options for a text response from the model. Can be plain text or structured JSON data.
`reasoning`	`null`	ReasoningConfig	Reasoning configuration for gpt-5 and o-series models.
`tool_choice`	`"auto"`	String or ToolChoiceObject	How the model should select which tool (or tools) to use when generating a response. See the `tools` parameter to see how to specify which tools the model can call.
`tools`	`null`	List[ToolObject]	An array of tools the model may call while generating a response. Note: Code interpreter and web search tools are not supported by Databricks.
`parallel_tool_calls`	`true`	Boolean	Whether to allow the model to run tool calls in parallel.
`max_tool_calls`	`null`	Integer greater than zero	The maximum number of total calls to built-in tools that can be processed in a response.
`metadata`	`null`	Object	Set of 16 key-value pairs that can be attached to an object.
`prompt_cache_key`	`null`	String	Used to cache responses for similar requests to optimize cache hit rates. Replaces the `user` field.
`prompt_cache_retention`	`null`	String	The retention policy for the prompt cache. Set to `"24h"` to enable extended prompt caching, which keeps cached prefixes active for longer, up to a maximum of 24 hours.
`safety_identifier`	`null`	String	A stable identifier used to help detect users of your application that may be violating usage policies.
`user`	`null`	String	Deprecated. Use `safety_identifier` and `prompt_cache_key` instead.
`truncation`	`null`	String	The truncation strategy to use for the model response.
`top_logprobs`	`null`	Integer	An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
`include`	`null`	List[String]	Specify additional output data to include in the model response.
`prompt`	`null`	Object	Reference to a prompt template and its variables.

Unsupported parameters: The following parameters are not supported by Databricks and will return a 400 error if specified:

background - Background processing is not supported
store - Stored responses is not supported
conversation - Conversation API is not supported
service_tier - Service tier selection is managed by Databricks

`ResponsesInput`

The input field accepts either a string or a list of input message objects with role and content.

Field	Type	Description
`role`	String	Required. The role of the message author. Can be `"user"` or `"assistant"`.
`content`	String or List[ResponsesContentBlock]	Required. The content of the message, either as a string or array of content blocks.

`ResponsesContentBlock`

Content blocks define the type of content in input and output messages. The content block type is determined by the type field.

`InputText`

Field	Type	Description
`type`	String	Required. Must be `"input_text"`.
`text`	String	Required. The text content.

`OutputText`

Field	Type	Description
`type`	String	Required. Must be `"output_text"`.
`text`	String	Required. The text content.
`annotations`	List[Object]	Optional annotations for the text content.

`InputImage`

Field	Type	Description
`type`	String	Required. Must be `"input_image"`.
`image_url`	String	Required. URL or base64-encoded data URI of the image.

`InputFile`

Field	Type	Description
`type`	String	Required. Must be `"input_file"`.
`file_id`	String	File identifier if using uploaded files.
`filename`	String	The name of the file.
`file_data`	String	Base64-encoded data URI with format prefix. For example, PDF files use format `data:application/pdf;base64,<base64 data>`.

`FunctionCall`

Field	Type	Description
`type`	String	Required. Must be `"function_call"`.
`id`	String	Required. Unique identifier for the function call.
`call_id`	String	Required. The call identifier.
`name`	String	Required. The name of the function being called.
`arguments`	Object/String	Required. The function arguments as JSON object or string.

`FunctionCallOutput`

Field	Type	Description
`type`	String	Required. Must be `"function_call_output"`.
`call_id`	String	Required. The call identifier this output corresponds to.
`output`	String/Object	Required. The function output as string or JSON object.

`CustomToolCall`

Returned in the response output array when a custom tool is called. Unlike function calls, custom tool calls return plain text input instead of JSON arguments.

Field	Type	Description
`type`	String	Required. Must be `"custom_tool_call"`.
`id`	String	Required. Unique identifier for this custom tool call.
`call_id`	String	Required. The call identifier.
`name`	String	Required. The name of the custom tool being called.
`input`	String	Required. The tool input as plain text (not JSON).
`status`	String	The status of the tool call. One of: `completed`, `in_progress`.

`CustomToolCallOutput`

Use this input type to provide the result of a custom tool call back to the model in a multi-turn conversation.

Field	Type	Description
`type`	String	Required. Must be `"custom_tool_call_output"`.
`call_id`	String	Required. The call identifier this output corresponds to.
`output`	String	Required. The custom tool output as a string.

`StreamOptions`

Configuration for streaming responses. Only used when stream: true.

Field	Type	Description
`include_usage`	Boolean	If true, include token usage information in the stream. Default is `false`.

`TextConfig`

Configuration for text output, including structured outputs.

Field	Type	Description
`format`	ResponsesFormatObject	The format specification for the text output.

`ResponsesFormatObject`

Specifies the output format for text responses.

Field	Type	Description
`type`	String	Required. The type of format: `"text"` for plain text, `"json_object"` for JSON, or `"json_schema"` for structured JSON.
`json_schema`	Object	Required when `type` is `"json_schema"`. The JSON schema object that defines the structure of the output.

The json_schema object has the same structure as JsonSchemaObject documented in the Chat Completions API.

`ReasoningConfig`

Configuration for reasoning behavior in reasoning models (o-series and gpt-5 models).

Field	Type	Description
`effort`	String	The reasoning effort level: `"low"`, `"medium"`, or `"high"`. Default is `"medium"`.
`encrypted_content`	String	Encrypted reasoning content for stateless mode. Provided by the model in previous responses.

`ToolObject`

See Function calling on Databricks.

note

The Responses API supports the following tool types: function, custom, mcp, image_generation, shell. Custom tools and grammar-based output formats are only available with GPT-5 series models (gpt-5, gpt-5.1, gpt-5.2).

Field	Type	Description
`type`	String	Required. The type of the tool. See note above for supported values.
`function`	FunctionObject	Required when `type` is `function`. The function definition associated with the tool.
`name`	String	Required when `type` is `custom`. The name of the custom tool.
`description`	String	Required when `type` is `custom`. A description of what the custom tool does.
`format`	CustomFormat	Optional. When `type` is `custom`, specifies the output format. Defaults to `{"type": "text"}`. Can also use `{"type": "grammar", "definition": "<grammar>", "syntax": "lark"}` for structured output. Only supported with GPT-5 series models.

`CustomToolObject`

Custom tools allow the model to return arbitrary string output instead of JSON-formatted function arguments. This is useful for code generation, applying patches, or other use cases where structured JSON is not required.

note

Custom tools are only supported with GPT-5 series models (gpt-5, gpt-5.1, gpt-5.2) through the Responses API.

Example custom tool:

JSON
{
  "type": "custom",
  "name": "code_exec",
  "description": "Executes arbitrary Python code. Return only valid Python code."
}

Example custom tool with grammar:

JSON
{
  "type": "custom",
  "name": "apply_patch",
  "description": "Apply a patch to create or modify files.",
  "format": {
    "type": "grammar",
    "definition": "start: begin_patch hunk end_patch\nbegin_patch: \"*** Begin Patch\" LF\n...",
    "syntax": "lark"
  }
}

When a custom tool is called, the response contains a custom_tool_call output item with plain text input instead of JSON arguments.

`CustomFormat`

Grammar-based output formats are only supported with GPT-5 series models.

Field	Type	Description
`type`	String	Required. Either `"text"` for plain text output or `"grammar"` for grammar-constrained output.
`definition`	String	Required when `type` is `"grammar"`. The grammar definition string using Lark syntax.
`syntax`	String	Required when `type` is `"grammar"`. The grammar syntax. Currently only `"lark"` is supported.

`FunctionObject`

Field	Type	Description
`name`	String	Required. The name of the function to be called.
`description`	Object	Required. The detailed description of the function. The model uses this description to understand the relevance of the function to the prompt and generate the tool calls with higher accuracy.
`parameters`	Object	The parameters the function accepts, described as a valid JSON schema object. If the tool is called, then the tool call is fit to the JSON schema provided. Omitting parameters defines a function without any parameters. The number of `properties` is limited to 15 keys.
`strict`	Boolean	Whether to enable strict schema adherence when generating the function call. If set to `true`, the model follows the exact schema defined in the schema field. Only a subset of JSON schema is supported when strict is `true`

`ToolChoiceObject`

See Function calling on Databricks.

Field	Type	Description
`type`	String	Required. The type of the tool to force. Supported values match the tool types in ToolObject: `"function"`, `"custom"`, etc.
`function`	Object	Required when `type` is `"function"`. An object of the form `{"name": "my_function"}` where `"my_function"` is the name of a FunctionObject in the `tools` field.
`name`	String	Required when `type` is `"custom"`. The name of the custom tool to force. Only supported with GPT-5 series models.

Responses API response

For non-streaming requests, the response is a single response object. For streaming requests, the response is a text/event-stream where each event is a response chunk.

Field	Type	Description
`id`	String	Unique identifier for the response. Note: Databricks encrypts this ID for security.
`object`	String	The object type. Equal to `"response"`.
`created_at`	Integer	The Unix timestamp (in seconds) when the response was created.
`status`	String	The status of the response. One of: `completed`, `failed`, `in_progress`, `cancelled`, `queued`, or `incomplete`.
`model`	String	The model version used to generate the response.
`output`	List[ResponsesMessage]	The output generated by the model, typically containing message objects.
`usage`	Usage	Token usage metadata.
`error`	Error	Error information if the response failed.
`incomplete_details`	IncompleteDetails	Details about why the response is incomplete, if applicable.
`instructions`	String	The instructions provided in the request.
`max_output_tokens`	Integer	The maximum output tokens specified in the request.
`temperature`	Float	The temperature used for generation.
`top_p`	Float	The top_p value used for generation.
`tools`	List[ToolObject]	The tools specified in the request.
`tool_choice`	String or ToolChoiceObject	The tool_choice setting from the request.
`parallel_tool_calls`	Boolean	Whether parallel tool calls were enabled.
`store`	Boolean	Whether the response was stored.
`metadata`	Object	The metadata attached to the response.

`ResponsesMessage`

Message objects in the output field containing the model's response content.

Field	Type	Description
`id`	String	Required. Unique identifier for the message.
`role`	String	Required. The role of the message. Either `"user"` or `"assistant"`.
`content`	List[ResponsesContentBlock]	Required. The content blocks in the message.
`status`	String	The status of the message processing.
`type`	String	Required. The object type. Equal to `"message"`.

`Error`

Error information when a response fails.

Field	Type	Description
`code`	String	Required. The error code.
`message`	String	Required. A human-readable error message.
`param`	String	The parameter that caused the error, if applicable.
`type`	String	Required. The error type.

`IncompleteDetails`

Details about why a response is incomplete.

Field	Type	Description
`reason`	String	Required. The reason the response is incomplete.

Chat Completions API

The Chat Completions API enables multi-turn conversations with a model. The model response provides the next assistant message in the conversation. See POST /serving-endpoints/{name}/invocations for querying endpoint parameters.

Chat request

Field	Default	Type	Description
`messages`		ChatMessage list	Required. A list of messages representing the current conversation.
`max_tokens`	`null`	`null`, which means no limit, or an integer greater than zero	The maximum number of tokens to generate.
`stream`	`true`	Boolean	Stream responses back to a client in order to allow partial results for requests. If this parameter is included in the request, responses are sent using the Server-sent events standard.
`temperature`	`1.0`	Float in [0,2]	The sampling temperature. 0 is deterministic and higher values introduce more randomness.
`top_p`	`1.0`	Float in (0,1]	The probability threshold used for nucleus sampling.
`top_k`	`null`	`null`, which means no limit, or an integer greater than zero	Defines the number of k most likely tokens to use for top-k-filtering. Set this value to 1 to make outputs deterministic.
`stop`	[]	String or List[String]	Model stops generating further tokens when any one of the sequences in `stop` is encountered.
`n`	1	Integer greater than zero	The API returns `n` independent chat completions when `n` is specified. Recommended for workloads that generate multiple completions on the same input for additional inference efficiency and cost savings. Only available for provisioned throughput endpoints.
`tool_choice`	`none`	String or ToolChoiceObject	Used only in conjunction with the `tools` field. `tool_choice` supports a variety of keyword strings such as `auto`, `required`, and `none`. `auto` means that you are letting the model decide which (if any) tool is relevant to use. With `auto` if the model doesn't believe any of the tools in `tools` are relevant, the model generates a standard assistant message instead of a tool call. `required` means that the model picks the most relevant tool in `tools` and must generate a tool call. `none` means that the model does not generate any tool calls and instead must generate a standard assistant message. To force a tool call with a specific tool defined in `tools`, use a `ToolChoiceObject`. By default, if the `tools` field is populated `tool_choice = "auto"`. Else, the `tools` field defaults to `tool_choice = "none"`
`tools`	`null`	ToolObject	A list of `tools` that the model can call. Currently, `function` is the only supported `tool` type and a max of 32 functions are supported.
`response_format`	`null`	ResponseFormatObject	An object specifying the format that the model must output. Accepted types are `text`, `json_schema` or `json_object` Setting to `{ "type": "json_schema", "json_schema": {...} }` enables structured outputs which ensures the model follows your supplied JSON schema. Setting to `{ "type": "json_object" }` ensures the responses the model generates is valid JSON, but does not ensure that responses follow a specific schema.
`logprobs`	`false`	Boolean	This parameter indicates whether to provide the log probability of a token being sampled.
`top_logprobs`	`null`	Integer	This parameter controls the number of most likely token candidates to return log probabilities for at each sampling step. Can be 0-20. `logprobs` must be `true` if using this field.
`reasoning_effort`	`"medium"`	String	Controls the level of reasoning effort the model should apply when generating responses. Accepted values are `"low"`, `"medium"`, or `"high"`. Higher reasoning effort may result in more thoughtful and accurate responses but may increase latency and token usage. This parameter is only accepted by a limited set of models, including `databricks-gpt-oss-120b` and `databricks-gpt-oss-20b`.

`ChatMessage`

Field	Type	Description
`role`	String	Required. The role of the author of the message. Can be `"system"`, `"user"`, `"assistant"` or `"tool"`.
`content`	String or List[ContentItem]	Required for chat tasks that do not involve tool calls. The content can be either a string or an array that contains a series of multimodal elements in a single chat interaction. These elements follow the sequence in which they are processed as inputs or outputs by the models. This array input is specifically designed for use with proprietary models accessible only through external model providers. Currently, only Claude models are supported. Use string-typed content for other external model providers, open source models (Llama), or models hosted by customers on Databricks. `list[ContentItem]` is not compatible with OpenAI's specifications.
`tool_calls`	ToolCall list	The list of `tool_calls` that the model generated. Must have `role` as `"assistant"` and no specification for the `content` field.
`tool_call_id`	String	When `role` is `"tool"`, the ID associated with the `ToolCall` that the message is responding to. Must be empty for other `role` options.

The system role can only be used once, as the first message in a conversation. It overrides the model's default system prompt.

`ContentItem`

ContentItem is one of the following content types: TextContent, ReasoningContent, DocumentContent, or ImageContent

`TextContent`

Field	Type	Description
`type`	String	Required. Must be text.
`text`	String	Required text content.
`citations`	List[Citation]	Optional citation information. See table below.
`cache_control`	String	Enables caching for your request. This parameter is only accepted by Databricks-hosted Claude models. See Prompt caching for an example.

The citations fields are as follows:

Field	Type	Description
`type`	String	Required. Must be `char_location`.
`cited_text`	String	The text cited from the document.
`document_index`	Integer	The index of the cited document.
`document_title`	String	The title of the cited document.
`start_char_index`	Integer	The starting index of the cited text in the document.
`end_char_index`	Integer	The ending index of the cited text in the document.

`ImageContent`

Field	Type	Description
`type`	String	Required. Must be an `image_url`.
`image_url`	ImageURL	Equivalent to the OpenAI image_url object.
`cache_control`	String	Enables caching for your request. This parameter is only accepted by Databricks-hosted Claude model. Image message content must use the encoded data as its source. URLs are not currently supported. See Prompt caching for an example.

ImageURL fields are below:

Field	Type	Description
`url`	String	Base64-encoded image data. Must be a valid base64 string generated from a supported image file format (JPEG, PNG, GIF, WebP, etc.).
`detail`	String	Specifies the detail level of the image.

`ReasoningContent`

Field	Type	Description
`type`	String	Required. Must be an `reasoning`.
`summary`	List[Summary]	Reasoning text contents. Summary can be either `TextSummary` or `EncryptedTextSummary`
`cache_control`	String	Enables caching for your request. This parameter is only accepted by Databricks hosted Claude models. See Prompt caching for an example.

`TextSummary`

Field	Type	Description
`type`	String	Required. Must be an `summary_text`.
`text`	String	A short summary of the reasoning used by the model when generating the response.
`signature`	String	Optional cryptographic tokens to verify the authenticity of the data.

`EncryptedTextSummary`

Field	Type	Description
`type`	String	Required. Must be a `summary_encrypted_text`.
`data`	String	Encrypted text content which is not human-readable due to safety reasons.

`DocumentContent`

DocumentContent is only for requests.

Field	Type	Description
`type`	String	Required. Must be `document`.
`title`	String	Title for the document.
`context`	String	Description of the document.
`source`	Source	Required. Specifies more information about the document, including format and contents.
`citations`	Map[string, bool]	Map with single field “enabled” that maps to a bool indicating whether to enable citations for the document.

`Source`

Field	Type	Description
`type`	String	Required. Must be one of `base64` (PDF), `text`, `content`, or `url` (URLPDFSource).
`media_type`	String	Required for PDF and text type. Must be `application` or `pdf` for PDF. Must be `text` or `plain` for text.
`data`	String	Required for PDF and text. The data containing the document source.
`content`	String or List[TextContent] or List[ImageContent]	Required for `content` type. The content of the document.
`url`	String	Required for URLPDFSource type. The URL of the PDF document.

`FileContent`

Field	Type	Description
`type`	String	Required. Must be file.
`file`	File	Required File content.

File fields are below:

Field	Type	Description
`filename`	String	The name of the file.
`file_data`	String	Required. OpenAI compatible base64 encoded file data. It starts with the file format follwed by the base64 encoded data. For example, a PDF file has format in `data:application/pdf;base64,<base64 data>`.
`url`	String	The publicly-accessible file URL. Supported only for Gemini models.

`ToolCall`

A tool call action suggestion by the model. See Function calling on Databricks.

Field	Type	Description
`id`	String	Required. A unique identifier for this tool call suggestion.
`type`	String	Required. Only `"function"` is supported.
`function`	FunctionCallCompletion	Required. A function call suggested by the model.
`cache_control`	String	Enables caching for your request. This parameter is only accepted by Databricks-hosted Claude models. See Prompt caching for an example.

`FunctionCallCompletion`

Field	Type	Description
`name`	String	Required. The name of the function the model recommended.
`arguments`	Object	Required. Arguments to the function as a serialized JSON dictionary.

Note: ToolChoiceObject, ToolObject, and FunctionObject are defined in the Responses API section and are shared between both APIs.

`ResponseFormatObject`

See Structured outputs on Databricks.

Field	Type	Description
`type`	String	Required. The type of response format being defined. Either `text` for unstructured text, `json_object` for unstructured JSON objects, or `json_schema` for JSON objects adhering to a specific schema.
`json_schema`	JsonSchemaObject	Required. The JSON schema to adhere to if `type` is set to `json_schema`

`JsonSchemaObject`

See Structured outputs on Databricks.

Field	Type	Description
`name`	String	Required. The name of the response format.
`description`	String	A description of what the response format is for, used by the model to determine how to respond in the format.
`schema`	Object	Required. The schema for the response format, described as a JSON schema object.
`strict`	Boolean	Whether to enable strict schema adherence when generating the output. If set to `true`, the model follows the exact schema defined in the schema field. Only a subset of JSON schema is supported when strict is `true`

Chat response

For non-streaming requests, the response is a single chat completion object. For streaming requests, the response is a text/event-stream where each event is a completion chunk object. The top-level structure of completion and chunk objects is almost identical: only choices has a different type.

Field	Type	Description
`id`	String	Unique identifier for the chat completion.
`choices`	List[ChatCompletionChoice] or List[ChatCompletionChunk] (streaming)	List of chat completion texts. `n` choices are returned if the `n` parameter is specified.
`object`	String	The object type. Equal to either `"chat.completions"` for non-streaming or `"chat.completion.chunk"` for streaming.
`created`	Integer	The time the chat completion was generated in seconds.
`model`	String	The model version used to generate the response.
`usage`	Usage	Token usage metadata. Might not be present on streaming responses.

`ChatCompletionChoice`

Field	Type	Description
`index`	Integer	The index of the choice in the list of generated choices.
`message`	ChatMessage	A chat completion message returned by the model. The role will be `assistant`.
`finish_reason`	String	The reason the model stopped generating tokens.
`extra_fields`	String	When using proprietary models from external model providers, the provider's APIs might include additional metadata in responses. Databricks filters these responses and returns only a subset of the provider's original fields. The `safetyRating` is the only extra field supported at this time, see the Gemini documentation for more details.

`ChatCompletionChunk`

Field	Type	Description
`index`	Integer	The index of the choice in the list of generated choices.
`delta`	ChatMessage	A chat completion message part of generated streamed responses from the model. Only the first chunk is guaranteed to have `role` populated.
`finish_reason`	String	The reason the model stopped generating tokens. Only the last chunk will have this populated.

Embeddings API

Embedding tasks map input strings into embedding vectors. Many inputs can be batched together in each request. See POST /serving-endpoints/{name}/invocations for querying endpoint parameters.

Embedding request

Field	Type	Description
`input`	String or List[String]	Required. The input text to embed. Can be a string or a list of strings.
`instruction`	String	An optional instruction to pass to the embedding model.

Instructions are optional and highly model specific. For instance the BGE authors recommend no instruction when indexing chunks and recommend using the instruction "Represent this sentence for searching relevant passages:" for retrieval queries. Other models like Instructor-XL support a wide range of instruction strings.

Embeddings response

Field	Type	Description
`id`	String	Unique identifier for the embedding.
`object`	String	The object type. Equal to `"list"`.
`model`	String	The name of the embedding model used to create the embedding.
`data`	EmbeddingObject	The embedding object.
`usage`	Usage	Token usage metadata.

`EmbeddingObject`

Field	Type	Description
`object`	String	The object type. Equal to `"embedding"`.
`index`	Integer	The index of the embedding in the list of embeddings generated by the model.
`embedding`	List[Float]	The embedding vector. Each model will return a fixed size vector (1024 for BGE-Large)

Completions API

Text completion tasks are for generating responses to a single prompt. Unlike Chat, this task supports batched inputs: multiple independent prompts can be sent in one request. See POST /serving-endpoints/{name}/invocations for querying endpoint parameters.

Completion request

Field	Default	Type	Description
`prompt`		String or List[String]	Required. The prompts for the model.
`max_tokens`	`null`	`null`, which means no limit, or an integer greater than zero	The maximum number of tokens to generate.
`stream`	`true`	Boolean	Stream responses back to a client in order to allow partial results for requests. If this parameter is included in the request, responses are sent using the Server-sent events standard.
`temperature`	`1.0`	Float in [0,2]	The sampling temperature. 0 is deterministic and higher values introduce more randomness.
`top_p`	`1.0`	Float in (0,1]	The probability threshold used for nucleus sampling.
`top_k`	`null`	`null`, which means no limit, or an integer greater than zero	Defines the number of k most likely tokens to use for top-k-filtering. Set this value to 1 to make outputs deterministic.
`error_behavior`	`"error"`	`"truncate"` or `"error"`	For timeouts and context-length-exceeded errors. One of: `"truncate"` (return as many tokens as possible) and `"error"` (return an error). This parameter is only accepted by pay per token endpoints.
`n`	1	Integer greater than zero	The API returns `n` independent chat completions when `n` is specified. Recommended for workloads that generate multiple completions on the same input for additional inference efficiency and cost savings. Only available for provisioned throughput endpoints.
`stop`	[]	String or List[String]	Model stops generating further tokens when any one of the sequences in `stop` is encountered.
`suffix`	`""`	String	A string that is appended to the end of every completion.
`echo`	`false`	Boolean	Returns the prompt along with the completion.
`use_raw_prompt`	`false`	Boolean	If `true`, pass the `prompt` directly into the model without any transformation.

Completion response

Field	Type	Description
`id`	String	Unique identifier for the text completion.
`choices`	CompletionChoice	A list of text completions. For every prompt passed in, `n` choices are generated if `n` is specified. Default `n` is 1.
`object`	String	The object type. Equal to `"text_completion"`
`created`	Integer	The time the completion was generated in seconds.
`usage`	Usage	Token usage metadata.

`CompletionChoice`

Field	Type	Description
`index`	Integer	The index of the prompt in request.
`text`	String	The generated completion.
`finish_reason`	String	The reason the model stopped generating tokens.

Additional resources

Databricks-hosted foundation models available in Foundation Model APIs

Endpoints​

Usage​

Responses API​

Responses API request​

ResponsesInput​

ResponsesContentBlock​

InputText​

OutputText​

InputImage​

InputFile​

FunctionCall​

FunctionCallOutput​

CustomToolCall​

CustomToolCallOutput​

StreamOptions​

TextConfig​

ResponsesFormatObject​

ReasoningConfig​

ToolObject​

CustomToolObject​

CustomFormat​

FunctionObject​

ToolChoiceObject​

Responses API response​

ResponsesMessage​

Error​

IncompleteDetails​

Chat Completions API​

Chat request​

ChatMessage​

ContentItem​

TextContent​

ImageContent​

ReasoningContent​

TextSummary​

EncryptedTextSummary​

DocumentContent​

Source​

FileContent​

ToolCall​

FunctionCallCompletion​

ResponseFormatObject​

JsonSchemaObject​

Chat response​

ChatCompletionChoice​

ChatCompletionChunk​

Embeddings API​

Embedding request​

Embeddings response​

EmbeddingObject​

Completions API​

Completion request​

Completion response​

CompletionChoice​

Additional resources​