Function calling on Databricks

Preview

This feature is in Public Preview and is supported on both Foundation Model APIs pay-per-token or provisioned throughput endpoints.

This article describes function calling and how to use it as part of your generative AI application workflows. Databricks Function Calling is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs.

What is function calling

Function calling provides a way for you to control the output of LLMs, so they generate structured responses more reliably. When you use a function call, you describe functions in the API call by describing the function arguments using a JSON schema. The LLM itself does not call these functions, but instead it creates a JSON object that users can use to call the functions in their code.

For function calling on Databricks, the basic sequence of steps are as follows:

  1. Call the model using the submitted query and a set of functions defined in the tools parameter.

  2. The model decides whether or not to call the defined functions. When the function is called, the content is a JSON object of strings that adheres to your custom schema.

  3. Parse the strings into JSON in your code, and call your function with the provided arguments if they exist.

  4. Call the model again by appending the structured response as a new message. The structure of the response is defined by the functions you previously provided in tools. From here, the model summarizes the results and sends that summary to the user.

When to use function calling

The following are example use cases for function calling:

  • Create assistants that can answer questions by calling other APIs. For example, you can define functions like send_email(to: string, body: string) or current_weather(location: string, unit: 'celsius' | 'fahrenheit').

  • Define and use API calls based on natural language. Like taking the statement, “Who are my top customers?” and making that into an API call named, get_customers(min_revenue: int, created_before: string, limit: int) and calling that API.

For batch inference or data processing tasks, like converting unstructured data into structured data. Databricks recommends using structured outputs.

Supported models

The following table lists the supported models and which model serving feature makes each model available.

Important

Meta Llama 3.1 is licensed under the LLAMA 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. Customers are responsible for ensuring compliance with applicable model licenses.

Model

Made available using model serving feature

Notes

Meta-Llama-3.1-405B-Instruct

Foundation Model APIs

Supported on pay-per-token and provisioned throughput workloads.

Meta-Llama-3.1-70B-Instruct

Foundation Model APIs

Supported on pay-per-token and provisioned throughput workloads.

Meta-Llama-3.1-8B-Instruct

Foundation Model APIs

Supported on provisioned throughput workloads only.

gpt-4o

External models

gpt-4o-2024-08-06

External models

gpt-4o-2024-05-13

External models

gpt-4o-mini

External models

Use function calling

To use function calling with your generative AI application, you must provide function parameters and a description.

The default behavior for tool_choice is "auto". This lets the model decide which functions to call and whether to call them.

You can customize the default behavior depending on your use case. The following are your options:

  • Set tool_choice: "required". In this scenario, the model always calls one or more functions. The model selects which function or functions to call.

  • Set tool_choice: {"type": "function", "function": {"name": "my_function"}}. In this scenario, the model calls only a specific function.

  • Set tool_choice: "none" to disable function calling and have the model only generate a user-facing message.

The following is a single turn example using the OpenAI SDK and its tools parameter. See Chat task for additional syntax details.

Important

During Public Preview, function calling on Databricks is optimized for single turn function calling.

import os
import json
from openai import OpenAI

DATABRICKS_TOKEN = os.environ.get('YOUR_DATABRICKS_TOKEN')
DATABRICKS_BASE_URL = os.environ.get('YOUR_DATABRICKS_BASE_URL')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url=DATABRICKS_BASE_URL
  )

tools = [
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": [
              "celsius",
              "fahrenheit"
            ]
          }
        }
      }
    }
  }
]

messages = [{"role": "user", "content": "What is the current temperature of Chicago?"}]

response = client.chat.completions.create(
    model="databricks-meta-llama-3-1-70b-instruct",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

print(json.dumps(response.choices[0].message.model_dump()['tool_calls'], indent=2))

JSON schema

Foundation Model APIs broadly support function definitions accepted by OpenAI. However, using a simpler JSON schema for function call definitions results in higher quality function call JSON generation. To promote higher quality generation, Foundation Model APIs only support a subset of JSON schema specifications.

The following function call definition keys are not supported:

  • Regular expressions using pattern.

  • Complex nested or schema composition and validation using: anyOf, oneOf, allOf, prefixItems, or $ref.

  • Lists of types except for the special case of [type, “null”] where one type in the list is a valid JSON type and the other is "null"

Additionally, the following limitations apply:

  • The maximum number of keys specified in the JSON schema is 16.

  • Foundation Model APIs does not enforce length or size constraints for objects and arrays.

    • This includes keywords like maxProperties, minProperties, and maxLength.

  • Heavily nested JSON schemas will result in lower quality generation. If possible, try flattening the JSON schema for better results.

Token usage

Prompt injection and other techniques are used to enhance the quality of tool calls. Doing so impacts the number of input and output tokens consumed by the model, which in turn results in billing implications. The more tools you use, the more your input tokens increase.

Limitations

The following are limitations for function calling during Public Preview:

  • The current function calling solution is optimized for single turn function calls. Multi-turn function calling is supported during the preview, but is under development.

  • Parallel function calling is not supported.

  • The maximum number of functions that can be defined in tools is 32 functions.

  • For provisioned throughput support, function calling is only supported on new endpoints. You cannot add function calling to previously created endpoints.

Notebook example

See the following notebook for detailed function calling examples

Function calling example notebook

Open notebook in new tab