Create an AI agent and its tools

Preview

This feature is in Public Preview.

This article shows you how to create AI agents and tools using the Mosaic AI Agent Framework.

Learn how to use the AI Playground to quickly prototype tool-calling agents and export them to Mosaic AI Agent Framework.

Requirements

Create AI agent tools

AI agents use tools to perform actions besides language generation, for example to retrieve structured or unstructured data, execute code, or talk to remote services (e.g. send an email or Slack message).

To provide tools to an agent with Mosaic AI Agent Framework you can use any combination of the following methods:

  1. Create or use existing Unity Catalog functions as tools, enabling easy discovery, governance, and sharing of tools.

  2. Define tools locally as Python functions in agent code.

Both approaches work regardless of whether your agent is written in custom Python code or using an agent-authoring library like LangGraph.

When defining tools, ensure that the tool, its parameters, and its return value are well-documented, so that the agent LLM can understand when and how to use the tool.

Create agent tools with Unity Catalog functions

These examples create AI agent tools using Unity Catalog functions written in a notebook environment or a SQL editor.

Run the following code in a notebook cell. It uses the %sql notebook magic to create a Unity Catalog function called python_exec.

An LLM can use this tool execute Python code provided to them by a user.

%sql
CREATE OR REPLACE FUNCTION
main.default.python_exec (
 code STRING COMMENT 'Python code to execute. Remember to print the final result to stdout.'
)
RETURNS STRING
LANGUAGE PYTHON
DETERMINISTIC
COMMENT 'Executes Python code in the sandboxed environment and returns its stdout. The runtime is stateless and you can not read output of the previous tool executions. i.e. No such variables "rows", "observation" defined. Calling another tool inside a Python code is NOT allowed. Use standard python libraries only.'
AS $$
 import sys
 from io import StringIO
 sys_stdout = sys.stdout
 redirected_output = StringIO()
 sys.stdout = redirected_output
 exec(code)
 sys.stdout = sys_stdout
 return redirected_output.getvalue()
$$

Run the following code in a SQL editor.

It creates a Unity Catalog function called lookup_customer_info that an LLM could use to retrieve structured data from a hypothetical customer_data table:

CREATE OR REPLACE FUNCTION main.default.lookup_customer_info(
  customer_name STRING COMMENT 'Name of the customer whose info to look up'
)
RETURNS STRING
COMMENT 'Returns metadata about a particular customer given the customer name, including the customer email and ID. The
customer ID can be used for other queries.'
RETURN SELECT CONCAT(
    'Customer ID: ', customer_id, ', ',
    'Customer Email: ', customer_email
  )
  FROM main.default.customer_data
  WHERE customer_name = customer_name
  LIMIT 1;

Prototype tool-calling agents in AI Playground

After creating the Unity Catalog functions, you can use the AI Playground to give them to an LLM and test the agent. The AI Playground provides a sandbox to prototype tool-calling agents.

Once you’re happy with the AI agent, you can export it to develop it further in Python or deploy it as a Model Serving endpoint as is.

Note

Unity Catalog, and serverless compute, Mosaic AI Agent Framework, and either pay-per-token foundation models or external models must be available in the current workspace to prototype agents in AI Playground.

To prototype a tool-calling endpoint.

  1. From Playground, select a model with the Function calling label.

    Select a tool-calling LLM
  2. Select Tools and specify your Unity Catalog function names in the dropdown:

    Select a tool
  3. Chat to test out the current combination of LLM, tools, and system prompt, and try variations.

    Prototype the LLM

Export and deploy AI Playground agents

After adding tools and testing the agent, export the Playground agent to Python notebooks:

  1. Click Export agent code to generate Python notebooks notebooks that help you develop and deploy the AI agent.

    After exporting the agent code, you see three files saved to your workspace:

    • agent notebook: Contains Python code defining your agent using LangChain.

    • driver notebook: Contains Python code to log, trace, register, and deploy the AI agent using Mosaic AI Agent Framework.

    • config.yml: Contains configuration information about your agent including tool definitions.

  2. Open the agent notebook to see the LangChain code defining your agent, use this notebook to test and iterate on the agent programmatically such as defining more tools or adjusting the agent’s parameters.

    Note

    The exported code might have different behavior from your AI playground session. Databricks recommends that you run the exported notebooks to iterate and debug further, evaluate agent quality, and then deploy the agent to share with others.

  3. Once you’re happy with the agent’s outputs, you can run the driver notebook to log and deploy your agent to a Model Serving endpoint.

Define an agent in code

In addition to generating agent code from AI Playground, you can also define an agent in code yourself, using either frameworks like LangChain or Python code. In order to deploy an agent using Agent Framework, its input must conform to one of the supported input and output formats.

Use parameters to configure the agent

In the Agent Framework, you can use parameters to control how agents are executed. This allows you to quickly iterate by varying characteristics of your agent without changing the code. Parameters are key-value pairs that you define in a Python dictionary or a .yaml file.

To configure the code, create a ModelConfig, a set of key-value parameters. ModelConfig is either a Python dictionary or a .yaml file. For example, you can use a dictionary during development and then convert it to a .yaml file for production deployment and CI/CD. For details about ModelConfig, see the MLflow documentation.

An example ModelConfig is shown below.

llm_parameters:
  max_tokens: 500
  temperature: 0.01
model_serving_endpoint: databricks-dbrx-instruct
vector_search_index: ml.docs.databricks_docs_index
prompt_template: 'You are a hello world bot. Respond with a reply to the user''s
  question that indicates your prompt template came from a YAML file. Your response
  must use the word "YAML" somewhere. User''s question: {question}'
prompt_template_input_vars:
- question

To call the configuration from your code, use one of the following:

# Example for loading from a .yml file
config_file = "configs/hello_world_config.yml"
model_config = mlflow.models.ModelConfig(development_config=config_file)

# Example of using a dictionary
config_dict = {
    "prompt_template": "You are a hello world bot. Respond with a reply to the user's question that is fun and interesting to the user. User's question: {question}",
    "prompt_template_input_vars": ["question"],
    "model_serving_endpoint": "databricks-dbrx-instruct",
    "llm_parameters": {"temperature": 0.01, "max_tokens": 500},
}

model_config = mlflow.models.ModelConfig(development_config=config_dict)

# Use model_config.get() to retrieve a parameter value
value = model_config.get('sample_param')

Supported input formats

The following are supported input formats for your agent.

  • (Recommended) Queries using the OpenAI chat completion schema. It should have an array of objects as a messages parameter. This format is best for RAG applications.

    question = {
        "messages": [
            {
                "role": "user",
                "content": "What is Retrieval-Augmented Generation?",
            },
            {
                "role": "assistant",
                "content": "RAG, or Retrieval Augmented Generation, is a generative AI design pattern that combines a large language model (LLM) with external knowledge retrieval. This approach allows for real-time data connection to generative AI applications, improving their accuracy and quality by providing context from your data to the LLM during inference. Databricks offers integrated tools that support various RAG scenarios, such as unstructured data, structured data, tools & function calling, and agents.",
            },
            {
                "role": "user",
                "content": "How to build RAG for unstructured data",
            },
        ]
    }
    
  • SplitChatMessagesRequest. Recommended for multi-turn chat applications, especially when you want to manage current query and history separately.

    question = {
        "query": "What is MLflow",
        "history": [
            {
                "role": "user",
                "content": "What is Retrieval-augmented Generation?"
            },
            {
                "role": "assistant",
                "content": "RAG is"
            }
        ]
    }
    

For LangChain, Databricks recommends writing your chain in LangChain Expression Language. In your chain definition code, you can use an itemgetter to get the messages or query or history objects depending on which input format you are using.

Supported output formats

Your agent must have one of the following supported output formats:

  • (Recommended) ChatCompletionResponse. This format is recommended for customers with OpenAI response format interoperability.

  • StringObjectResponse. This format is the easiest and simplest to interpret.

For LangChain, use StringResponseOutputParser() or ChatCompletionsOutputParser() from MLflow as your final chain step. Doing so formats the LangChain AI message into an Agent-compatible format.


  from mlflow.langchain.output_parsers import StringResponseOutputParser, ChatCompletionsOutputParser

  chain = (
      {
          "user_query": itemgetter("messages")
          | RunnableLambda(extract_user_query_string),
          "chat_history": itemgetter("messages") | RunnableLambda(extract_chat_history),
      }
      | RunnableLambda(fake_model)
      | StringResponseOutputParser() # use this for StringObjectResponse
      # ChatCompletionsOutputParser() # or use this for ChatCompletionResponse
  )

If you are using PyFunc, Databricks recommends using type hints to annotate the predict() function with input and output data classes that are subclasses of classes defined in mlflow.models.rag_signatures.

You can construct an output object from the data class inside predict() to ensure the format is followed. The returned object must be transformed into a dictionary representation to ensure it can be serialized.


  from mlflow.models.rag_signatures import ChatCompletionRequest, ChatCompletionResponse, ChainCompletionChoice, Message

  class RAGModel(PythonModel):
    ...
      def predict(self, context, model_input: ChatCompletionRequest) -> ChatCompletionResponse:
        ...
        return asdict(ChatCompletionResponse(
            choices=[ChainCompletionChoice(message=Message(content=text))]
        ))

Example notebooks

These notebooks create a simple “Hello, world” chain to illustrate how to create a chain application in Databricks. The first example creates a simple chain. The second example notebook illustrates how to use parameters to minimize code changes during development.

Simple chain notebook

Open notebook in new tab

Simple chain driver notebook

Open notebook in new tab

Parameterized chain notebook

Open notebook in new tab

Parameterized chain driver notebook

Open notebook in new tab