Agent system design patterns
Building an agent system involves orchestrating how LLM calls, data retrieval, and external actions flow together. You can think of design patterns for agent systems on a continuum of complexity and autonomy: from deterministic chains, through single-agent systems that can make dynamic decisions (and may involve multiple LLM calls under the hood), up to multi-agent architectures that coordinate multiple specialized agents.
Tool calling
Before diving into design patterns, it’s important to understand tool calling as a fundamental capability that can be used in any agent system, from simple to complex. Tool calling is a mechanism that allows an agent system to interact with external functions, data sources, or services. This can enable:
- Live data lookups such as SQL queries, CRM fetches, or vector index retrieval.
- Actions such as send an email or update a record.
- Arbitrary logic or transformations via Python functions or APIs.
Tool calling thus provides a powerful mechanism for making LLMs “aware” of external data or APIs no matter which design pattern you choose.
To learn more about agent tools, see AI agent tools.
The sections below discuss three agent system design patterns, each of which can leverage tool calling to different degrees.
Compare gen AI app design patterns
Gen AI app (agent) design patterns are presented in order of complexity.
Design Pattern | When to Use | Pros | Cons |
---|---|---|---|
|
|
| |
|
|
| |
Large or cross-functional domains; multiple “expert” agents; distinct logic or conversation contexts; advanced reflection patterns. |
|
|
Single-agent system
A single-agent system features one coordinated flow of logic - often orchestrating multiple LLM calls - to handle incoming requests. The agent can:
- Accept requests such as user queries and any relevant context such as conversation history.
- Reason about how best to respond, optionally deciding whether to call tools for external data or actions.
- Iterate if needed, calling an LLM (and/or the same tools) repeatedly until an objective is achieved or a certain condition is met (such as receiving valid data or resolving an error).
- Integrate tool outputs into the conversation.
- Return a cohesive response as output.
In many use cases, a single round of LLM reasoning (often with tool calling) is enough. However, more advanced agents can loop through multiple steps until they arrive at a desired outcome.
Even though there is just “one” agent, you can still have multiple LLM and tool calls under the hood (for planning, generation, verification, and so on), all managed by this single, unified flow.
Example: Help desk assistant
- If the user asks a simple question (“What is our returns policy?”), the agent may respond directly from the LLM’s knowledge.
- If the user wants their order status, the agent calls a function
lookup_order(customer_id, order_id)
. If that tool responds with “invalid order number,” the agent may retry or prompt the user for the correct ID, continuing until it can provide a final answer.
When to use:
- You expect varied user queries but still within a cohesive domain or product area.
- Certain queries or conditions may warrant tool usage such as deciding when to fetch customer data.
- You want more flexibility than a deterministic chain but don’t require separate specialized agents for different tasks.
Advantages:
- The agent can adapt to new or unexpected queries by choosing which (if any) tools to call.
- The agent can loop through repeated LLM calls or tool invocations to refine results - without needing a fully multi-agent setup.
- This design pattern is often the sweet spot for enterprise use cases - simpler to debug than multi-agent setups while still allowing dynamic logic and limited autonomy.
Considerations:
- Compared to a hard-coded chain, you must guard against repeated or invalid tool calls. (Infinite loops can occur in any tool-calling scenario, so set iteration limits or timeouts.)
- If your application spans radically different sub-domains (finance, devops, marketing, etc.), a single agent may become unwieldy or overloaded with functionality requirements.
- You still need carefully designed prompts and constraints to keep the agent focused and relevant.
Deterministic chain (hard-coded steps)
In this pattern, the developer defines which components are called, in what order, and with which parameters. There is no dynamic decision making about which tools to call or in what order. The system follows a predefined workflow for all requests, making it highly predictable.
Commonly called a “Chain”, the flow is essentially a fixed chain of steps, such as:
- Always retrieve the user’s request and retrieve from a vector index for relevant context.
- Combine that context with the user’s request into a final LLM prompt.
- Call the LLM and return the response.
Example: Basic RAG chain
A deterministic RAG chain might always:
- Retrieve top-k results from a vector index using an incoming user's request (retrieve).
- Format retrieved chunks into a prompt template (augment).
- Pass that augmented prompt to the LLM (generate).
- Return the LLM's response.
When to use:
- For well-defined tasks with predictable workflows.
- When consistency and auditing are top priorities.
- When you want to minimize latency by avoiding multiple LLM calls for orchestration decisions.
Advantages:
- Highest predictability and auditability.
- Typically lower latency (fewer LLM calls for orchestration).
- Easier to test and validate.
Considerations:
- Limited flexibility for handling diverse or unexpected requests.
- Can become complex and difficult to maintain as logic branches grow.
- May require significant refactoring to accommodate new capabilities.
Multi-agent system
A multi-agent system involves two or more specialized agents that exchange messages and/or collaborate on tasks. Each agent has its own domain or task expertise, context, and potentially distinct tool sets. A separate “coordinator” - which might be another LLM or a rule-based router - directs requests to the appropriate agent, or decides when to hand off from one agent to another.
Example: Enterprise assistant with specialized agents
- Customer support agent: Handles CRM lookups, returns, and shipping.
- Analytics agent: Focuses on SQL queries and data summarization.
- Supervisor/router: Chooses which agent is best for a given user query, or when to switch.
Each sub-agent can perform tool calling within its own domain (such as lookup_customer_account
or run_sql_query
) often requiring unique prompts or conversation histories.
When to use:
- You have distinct problem areas or skill sets such as a coding agent or a finance agent.
- Each agent needs access to conversation history or domain-specific prompts.
- You have so many tools that fitting them all into one agent’s schema is impractical; each agent can own a subset.
- You want to implement reflection, critique, or back-and-forth collaboration among specialized agents.
Advantages:
- This modular approach means each agent can be developed or maintained by separate teams, specializing in a narrow domain.
- Can handle large, complex enterprise workflows that a single agent might struggle to manage cohesively.
- Facilitates advanced multi-step or multi-perspective reasoning - for instance, one agent generating an answer, another verifying it.
Considerations:
- Requires a strategy for routing between agents, plus overhead for logging, tracing, and debugging across multiple endpoints.
- If you have many sub-agents and tools, it can get complicated to decide which agent has access to which data or APIs.
- Agents can bounce tasks indefinitely among themselves without resolution if not carefully constrained.
- Infinite loop risks also exist in single-agent tool-calling, but multi-agent setups add another layer of debugging complexity.
Practical advice
Regardless of which design pattern you select, consider the following best practices for developing stable, maintainable agent systems.
- Start simple: If you only need a straightforward chain, a deterministic chain is fast to build.
- Gradually add complexity: As you need more dynamic queries or flexible data sources, move to a single-agent system with tool calling.
- Go multi-agent: Only if you have clearly distinct domains or tasks, multiple conversation contexts, or a large tool set that’s too big for a single agent’s prompt.
If your use case starts small - like a straightforward RAG chain - begin with a hard-coded chain. As requirements evolve, you can add tool-calling logic for dynamic decision-making or even segment tasks into multiple specialized agents. In practice, many real-world agent systems combine patterns. For instance, use a mostly deterministic chain but allow the LLM to dynamically call certain APIs in a single step if needed.
Mosaic AI Agent Framework is agnostic of whatever pattern you choose, making it easy to evolve design patterns as your application grows.
Development guidance
- Prompts
- Keep prompts clear and minimal to avoid contradictory instructions, distracting information and reduce hallucinations.
- Provide only the tools and context your agent requires, rather than an unbounded set of APIs or large irrelevant context.
- Logging & observability
- Implement detailed logging for each user request, agent plan, and tool call. Tools like MLflow Tracing can help capture structured logs for debugging.
- Store logs securely and be mindful of personally identifiable information (PII) in conversation data.
- Model updates & version pinning
- LLM behaviors can shift when providers update models behind the scenes. Use version pinning and frequent regression tests to ensure your agent logic remains robust and stable.
- Combining MLflow with Mosaic AI Agent Evaluation provides a streamlined way of versioning agents and regularly evaluating quality and performance.
Testing and iteration guidance
- Error handling & fallback logic
- Plan for tool or LLM failures. Timeouts, malformed responses, or empty results can break a workflow. Include retry strategies, fallback logic, or a simpler fallback chain when advanced features fail.
- Iterative prompt engineering
- Expect to refine prompts and chain logic over time. Version each change (using Git and MLflow) so you can rollback or compare performance across versions.
- Consider frameworks like DSPy to programmatically optimize prompts and other components within your agent system.
Production guidance
- Latency vs. cost optimization
- Each additional LLM or tool call increases token usage and response time. Where possible, combine steps or cache repeated queries to keep performance and cost manageable.
- Security and sandboxing
- If your agent can update records or run code, sandbox those actions or enforce human approval where necessary. This is critical in enterprise or regulated environments to avoid unintended harm.
- Databricks recommends Unity Catalog tools for sandboxed execution. See Unity Catalog function tools vs. agent code tools. Unity Catalog enables isolation of the arbitrary code execution and prevents malicious actors from tricking the agent into generating and running code that interferes or eavesdrops on other requests.
By following these guidelines, you can mitigate many of the most common failure modes such as tool mis-calls, drifting LLM performance, or unexpected cost spikes, and build more reliable, scalable agent systems.