Create an AI agent and its tools
Preview
This feature is in Public Preview.
This article shows you how to create AI agents and tools using the Mosaic AI Agent Framework.
Learn how to use the AI Playground to quickly prototype tool-calling agents and export them to Mosaic AI Agent Framework.
Requirements
Understand the concepts of AI agents and tools as described in What are compound AI system and AI agents?
Databricks recommends installing the latest version of the MLflow Python client when developing agents. See Authentication for dependent resources for information on
mlflow
version requirements.
Create AI agent tools
AI agents use tools to perform actions besides language generation, for example to retrieve structured or unstructured data, execute code, or talk to remote services (e.g. send an email or Slack message).
To provide tools to an agent with Mosaic AI Agent Framework you can use any combination of the following methods:
Create or use existing Unity Catalog functions as tools. Unity Catalog functions provide easy discovery, governance, and sharing, and work well for applying transformations and aggregations on large datasets.
Define tools locally as Python functions in agent code. This approach is useful in situations where you need to make calls to REST APIs, use arbitrary code or libraries, or execute tools with very low latency. This approach lacks the built-in discoverability and governance provided by Unity Catalog functions. Databricks recommends weighing this tradeoff when building your agent to determine which approach is best.
Both approaches work for agents written in custom Python code or using an agent-authoring library like LangGraph.
When defining tools, ensure that the tool, its parameters, and its return value are documented, so that the agent LLM can understand when and how to use the tool.
Create agent tools with Unity Catalog functions
These examples create AI agent tools using Unity Catalog functions written in a notebook environment or a SQL editor.
Run the following code in a notebook cell. It uses the %sql
notebook magic to create a Unity Catalog function called python_exec
.
An LLM can use this tool execute Python code provided to them by a user.
%sql
CREATE OR REPLACE FUNCTION
main.default.python_exec (
code STRING COMMENT 'Python code to execute. Remember to print the final result to stdout.'
)
RETURNS STRING
LANGUAGE PYTHON
DETERMINISTIC
COMMENT 'Executes Python code in the sandboxed environment and returns its stdout. The runtime is stateless and you can not read output of the previous tool executions. i.e. No such variables "rows", "observation" defined. Calling another tool inside a Python code is NOT allowed. Use standard python libraries only.'
AS $$
import sys
from io import StringIO
sys_stdout = sys.stdout
redirected_output = StringIO()
sys.stdout = redirected_output
exec(code)
sys.stdout = sys_stdout
return redirected_output.getvalue()
$$
Run the following code in a SQL editor.
It creates a Unity Catalog function called lookup_customer_info
that an LLM could use to retrieve structured data from a hypothetical customer_data
table:
CREATE OR REPLACE FUNCTION main.default.lookup_customer_info(
customer_name STRING COMMENT 'Name of the customer whose info to look up'
)
RETURNS STRING
COMMENT 'Returns metadata about a particular customer given the customer name, including the customer email and ID. The
customer ID can be used for other queries.'
RETURN SELECT CONCAT(
'Customer ID: ', customer_id, ', ',
'Customer Email: ', customer_email
)
FROM main.default.customer_data
WHERE customer_name = customer_name
LIMIT 1;
Prototype tool-calling agents in AI Playground
After creating the Unity Catalog functions, you can use the AI Playground to give them to an LLM and test the agent. The AI Playground provides a sandbox to prototype tool-calling agents.
Once you’re happy with the AI agent, you can export it to develop it further in Python or deploy it as a Model Serving endpoint as is.
Note
Unity Catalog, and serverless compute, Mosaic AI Agent Framework, and either pay-per-token foundation models or external models must be available in the current workspace to prototype agents in AI Playground.
To prototype a tool-calling endpoint.
From Playground, select a model with the Function calling label.
Select Tools and specify your Unity Catalog function names in the dropdown:
Chat to test out the current combination of LLM, tools, and system prompt, and try variations.
Export and deploy AI Playground agents
After adding tools and testing the agent, export the Playground agent to Python notebooks:
Click Export agent code to generate Python notebooks notebooks that help you develop and deploy the AI agent.
After exporting the agent code, you see three files saved to your workspace:
agent
notebook: Contains Python code defining your agent using LangChain.driver
notebook: Contains Python code to log, trace, register, and deploy the AI agent using Mosaic AI Agent Framework.config.yml
: Contains configuration information about your agent including tool definitions.
Open the
agent
notebook to see the LangChain code defining your agent, use this notebook to test and iterate on the agent programmatically such as defining more tools or adjusting the agent’s parameters.Note
The exported code might have different behavior from your AI playground session. Databricks recommends that you run the exported notebooks to iterate and debug further, evaluate agent quality, and then deploy the agent to share with others.
Once you’re happy with the agent’s outputs, you can run the
driver
notebook to log and deploy your agent to a Model Serving endpoint.
Define an agent in code
In addition to generating agent code from AI Playground, you can also define an agent in code yourself, using either frameworks like LangChain or Python code. In order to deploy an agent using Agent Framework, its input must conform to one of the supported input and output formats.
Use parameters to configure the agent
In the Agent Framework, you can use parameters to control how agents are executed. This allows you to quickly iterate by varying characteristics of your agent without changing the code. Parameters are key-value pairs that you define in a Python dictionary or a .yaml
file.
To configure the code, create a ModelConfig
, a set of key-value parameters. ModelConfig
is either a Python dictionary or a .yaml
file. For example, you can use a dictionary during development and then convert it to a .yaml
file for production deployment and CI/CD. For details about ModelConfig
, see the MLflow documentation.
An example ModelConfig
is shown below.
llm_parameters:
max_tokens: 500
temperature: 0.01
model_serving_endpoint: databricks-dbrx-instruct
vector_search_index: ml.docs.databricks_docs_index
prompt_template: 'You are a hello world bot. Respond with a reply to the user''s
question that indicates your prompt template came from a YAML file. Your response
must use the word "YAML" somewhere. User''s question: {question}'
prompt_template_input_vars:
- question
To call the configuration from your code, use one of the following:
# Example for loading from a .yml file
config_file = "configs/hello_world_config.yml"
model_config = mlflow.models.ModelConfig(development_config=config_file)
# Example of using a dictionary
config_dict = {
"prompt_template": "You are a hello world bot. Respond with a reply to the user's question that is fun and interesting to the user. User's question: {question}",
"prompt_template_input_vars": ["question"],
"model_serving_endpoint": "databricks-dbrx-instruct",
"llm_parameters": {"temperature": 0.01, "max_tokens": 500},
}
model_config = mlflow.models.ModelConfig(development_config=config_dict)
# Use model_config.get() to retrieve a parameter value
value = model_config.get('sample_param')
Supported input formats
The following are supported input formats for your agent.
(Recommended) Queries using the OpenAI chat completion schema. It should have an array of objects as a
messages
parameter. This format is best for RAG applications.question = { "messages": [ { "role": "user", "content": "What is Retrieval-Augmented Generation?", }, { "role": "assistant", "content": "RAG, or Retrieval Augmented Generation, is a generative AI design pattern that combines a large language model (LLM) with external knowledge retrieval. This approach allows for real-time data connection to generative AI applications, improving their accuracy and quality by providing context from your data to the LLM during inference. Databricks offers integrated tools that support various RAG scenarios, such as unstructured data, structured data, tools & function calling, and agents.", }, { "role": "user", "content": "How to build RAG for unstructured data", }, ] }
SplitChatMessagesRequest
. Recommended for multi-turn chat applications, especially when you want to manage current query and history separately.question = { "query": "What is MLflow", "history": [ { "role": "user", "content": "What is Retrieval-augmented Generation?" }, { "role": "assistant", "content": "RAG is" } ] }
For LangChain, Databricks recommends writing your chain in LangChain Expression Language. In your chain definition code, you can use an itemgetter
to get the messages or query
or history
objects depending on which input format you are using.
Supported output formats
Your agent must have one of the following supported output formats:
(Recommended) ChatCompletionResponse. This format is recommended for customers with OpenAI response format interoperability.
StringObjectResponse. This format is the easiest and simplest to interpret.
For LangChain, use StringResponseOutputParser()
or ChatCompletionsOutputParser()
from MLflow as your final chain step. Doing so formats the LangChain AI message into an Agent-compatible format.
from mlflow.langchain.output_parsers import StringResponseOutputParser, ChatCompletionsOutputParser
chain = (
{
"user_query": itemgetter("messages")
| RunnableLambda(extract_user_query_string),
"chat_history": itemgetter("messages") | RunnableLambda(extract_chat_history),
}
| RunnableLambda(fake_model)
| StringResponseOutputParser() # use this for StringObjectResponse
# ChatCompletionsOutputParser() # or use this for ChatCompletionResponse
)
If you are using PyFunc, Databricks recommends using type hints to annotate the predict()
function with input and output data classes that are subclasses of classes defined in mlflow.models.rag_signatures
.
You can construct an output object from the data class inside predict()
to ensure the format is followed. The returned object must be transformed into a dictionary representation to ensure it can be serialized.
from mlflow.models.rag_signatures import ChatCompletionRequest, ChatCompletionResponse, ChainCompletionChoice, Message
class RAGModel(PythonModel):
...
def predict(self, context, model_input: ChatCompletionRequest) -> ChatCompletionResponse:
...
return asdict(ChatCompletionResponse(
choices=[ChainCompletionChoice(message=Message(content=text))]
))
Example notebooks
These notebooks create a simple “Hello, world” chain to illustrate how to create a chain application in Databricks. The first example creates a simple chain. The second example notebook illustrates how to use parameters to minimize code changes during development.