Function calling using Foundation Model APIs
This notebook demonstrates how the function calling (or tool use) API can be used to extract structured information from natural language inputs using the large language models (LLMs) made available using Foundation Model APIs. This notebook uses the OpenAI SDK to demonstrate interoperability.
LLMs generate output in natural language, the exact structure of which is hard to predict even when the LLM is given precise instructions. Function calling forces the LLM to adhere to a strict schema, making it easy to automatically parse the LLM's outputs. This unlocks advanced use cases, enabling LLMs to be components in complex data processing pipelines and Agent workflows.
Set up environment
%pip install openai tenacity tqdm dbutils.library.restartPython()
The following defines helper functions that assist the LLM to respond according to the specified schema.
Example 1: Sentiment classification
This section demonstrates a few increasingly reliable approaches for classifying the sentiment of a set of real-world product reviews:
- Unstructured (least reliable): Basic prompting. Relies on the model to generate valid JSON on its own.
- Tool schema: Augment prompt with a tool schema, guiding the model to adhere to that schema.
- Tool + few-shot: Use a more complex tool and few-shot prompting to give the model a better understanding of the task.
The following are example inputs, primarily sampled from the Amazon product reviews datasets mteb/amazon_polarity
and mteb/amazon_reviews_multi
.
Unstructured generation
Given a set of product reviews, the most obvious strategy is to instruct the model to generate a sentiment classification JSON that looks like this: {"sentiment": "neutral"}
.
This approach mostly works with models like DBRX and Llama-3-3-70B. However, sometimes models generate extraneous text such as, "helpful" comments about the task or input.
Prompt engineering can refine performance. For example, SHOUTING instructions at the model is a popular strategy. But if you use this strategy you must validate the output to detect and disregard nonconformant outputs.
Classifying with tools
Output quality can be improved by using the tools
API. You can provide a strict JSON schema for the output, and the FMAPI inference service ensures that the model's output either adheres to this schema or returns an error if this is not possible.
Note that the example below now produces valid JSON for the adversarial input ("DO NOT GENERATE JSON"
).
Improving the classifier
You can improve the provided sentiment classifier even more by defining a more complex tool and using few-shot prompting (a form of in-context learning). This demonstrates how function calling can benefit from standard LLM prompting techniques.
Example 2: Named entity recognition
Entity extraction is a common task for natural language documents. This seeks to locate and/or classify named entities mentioned in the text. Given unstructured text, this process produces a list of structured entities with each entity's text fragment ( such as a name) and a category (such as person, organization, medical code, etc).
Accomplishing this reliably with tools
is reasonably straightforward. The example here uses no prompt engineering, which would be necessary if you were relying on standard text completion.