Skip to content

Multi-agent Systems

Multi-agent systems

Introduction

At our company, we define an agent as a system that uses an LLM to decide the control flow of an application. When working with clients, we often encounter complex requirements that a single agent cannot effectively handle due to challenges like:

  • Poor tool selection when an agent has too many tools
  • Complex context management across multiple domains
  • Need for specialized domain expertise in different areas

Our approach to multi-agent systems addresses these challenges by breaking applications into smaller, independent agents. Depending on client needs, these agents can range from simple prompt-LLM combinations to sophisticated ReAct agents with specialized capabilities.

Key Benefits for Our Clients

  • Modularity: Easier development, testing, and maintenance of complex systems
  • Specialization: Domain-expert agents that excel in specific business functions
  • Control: Explicit management of agent communication with clear boundaries
  • Scalability: Ability to grow systems incrementally as business needs evolve

The Big Tool Approach

While our multi-agent systems effectively address most complex business needs, we’ve identified scenarios where clients require a single agent to access a vast number of specialized tools. For these cases, we implement langgraph-bigtool, a solution that allows scaling to hundreds or thousands of tools within a single agent.

What is langgraph-bigtool?

langgraph-bigtool is a Python library we use to create LangGraph agents capable of accessing large tool libraries. Rather than overwhelming the context window with all available tools, it leverages LangGraph’s long-term memory store to search for and retrieve only the most relevant tools for a given task.

import uuid
from langchain.chat_models import init_chat_model
from langchain.embeddings import init_embeddings
from langgraph.store.memory import InMemoryStore
from langgraph_bigtool import create_agent
# Register tools with unique identifiers
tool_registry = {
str(uuid.uuid4()): tool
for tool in all_tools
}
# Index tool descriptions for retrieval
embeddings = init_embeddings("openai:text-embedding-3-small")
store = InMemoryStore(
index={
"embed": embeddings,
"dims": 1536,
"fields": ["description"],
}
)
# Store tool metadata for search
for tool_id, tool in tool_registry.items():
store.put(
("tools",),
tool_id,
{
"description": f"{tool.name}: {tool.description}",
},
)
# Create and compile agent
llm = init_chat_model("openai:gpt-4o-mini")
builder = create_agent(llm, tool_registry)
agent = builder.compile(store=store)

Key Capabilities

  • Scalable Tool Access: Agents can work with hundreds or thousands of tools without context overload
  • Intelligent Tool Selection: Tools are retrieved based on relevance to the current task
  • Customizable Retrieval: Custom retrieval functions can be implemented to match specific business logic
  • Enterprise-ready Storage: Supports both in-memory and production-ready Postgres backends

Custom Tool Retrieval Logic

For enterprise clients with complex tool categorization needs, we implement custom retrieval logic:

def retrieve_tools(
category: Literal["billing", "service"],
) -> list[str]:
"""Get tools for a specific business domain."""
if category == "billing":
return ["payment_processor", "invoice_generator"]
else:
return ["ticket_creator", "service_lookup"]
builder = create_agent(
llm, tool_registry, retrieve_tools_function=retrieve_tools
)

Our Multi-agent Architecture Approaches

Through our client implementations, we’ve developed three primary architectural patterns for multi-agent systems:

1. Network Architecture

We implement this model when business processes require flexible communication paths between different specialized functions. Each agent decides which other agent to call next based on the current context and requirements.

from typing import Literal
from langchain_openai import ChatOpenAI
from langgraph.types import Command
from langgraph.graph import StateGraph, MessagesState, START, END
model = ChatOpenAI()
def agent_1(state: MessagesState) -> Command[Literal["agent_2", "agent_3", END]]:
response = model.invoke(...)
return Command(
goto=response["next_agent"],
update={"messages": [response["content"]]},
)
# Add more agents...
builder = StateGraph(MessagesState)
builder.add_node(agent_1)
# Add more nodes...
network = builder.compile()

2. Supervisor Architecture

This is our most commonly implemented architecture, where a central supervisor agent coordinates all other specialized agents. This provides clear control flow and is particularly effective for business workflows with well-defined processes.

def supervisor(state: MessagesState) -> Command[Literal["agent_1", "agent_2", END]]:
response = model.invoke(...)
return Command(goto=response["next_agent"])
def agent_1(state: MessagesState) -> Command[Literal["supervisor"]]:
response = model.invoke(...)
return Command(
goto="supervisor",
update={"messages": [response]},
)
builder = StateGraph(MessagesState)
builder.add_node(supervisor)
builder.add_node(agent_1)
# Add more nodes...
supervisor = builder.compile()

3. Hierarchical Architecture

For our enterprise clients with complex organizational structures, we implement hierarchical multi-agent systems. This allows for teams of specialized agents managed by mid-level supervisors, all coordinated by an executive-level supervisor.

# Team 1 - e.g., Customer Data Analysis
def team_1_supervisor(state: MessagesState):
response = model.invoke(...)
return Command(goto=response["next_agent"])
# Team 2 - e.g., Market Research
def team_2_supervisor(state: MessagesState):
response = model.invoke(...)
return Command(goto=response["next_agent"])
# Top-level supervisor
def top_level_supervisor(state: MessagesState):
response = model.invoke(...)
return Command(goto=response["next_team"])
builder = StateGraph(MessagesState)
builder.add_node(top_level_supervisor)
builder.add_node("team_1_graph", team_1_graph)
builder.add_node("team_2_graph", team_2_graph)

Our Communication Patterns

Through our implementations, we’ve developed two primary communication patterns that work effectively in production environments:

State-based Communication

In our LangGraph implementations, agents communicate through a shared graph state, which we typically configure as:

  • A common state schema for cross-functional agents
  • Different state schemas for specialized domain agents
  • A combination of shared and private states for sensitive information
def agent_with_shared_state(state: MessagesState):
# Access and update shared state
current_context = state["messages"]
response = model.invoke(current_context)
return {"messages": current_context + [response]}

Tool-based Communication

For systems requiring strict boundaries, we implement tool-based communication, where agents provide services to each other through well-defined interfaces:

def agent_as_tool(state: Annotated[dict, InjectedState]):
response = model.invoke(state["current_query"])
return response.content
tools = [agent_as_tool]
supervisor = create_react_agent(model, tools)

Implementation Best Practices

From our real-world client implementations, we’ve developed these best practices:

1. State Management

  • Implement proper memory management for enterprise-scale conversations
  • Choose appropriate state schemas based on business domain requirements
  • Design state cleanup mechanisms to prevent context bloat in long-running systems

2. Communication Protocol

  • Define clear interfaces between agents based on business function boundaries
  • Determine information sharing policies (full thought processes vs. conclusions)
  • Implement robust error handling with fallback mechanisms

3. System Design

  • Begin with the simplest effective architecture (usually supervisor-based)
  • Monitor agent performance with detailed logs and performance metrics
  • Scale by adding specialized agents rather than increasing individual agent complexity

Systematic Implementation Process

For each client implementation, we follow a structured approach: from business analysis and architecture design to testing and continuous monitoring.

Human-in-the-Loop Integration

Our multi-agent systems are designed to complement human capabilities, with clear handoff protocols between automated and human processes.

Case Studies

Our multi-agent systems have transformed business operations across various domains:

Customer Service Enhancement

We’ve implemented supervisor-based multi-agent systems with specialized agents for:

  • Information retrieval from knowledge bases
  • Customer query classification and prioritization
  • Problem-specific solution generation

Business Intelligence

Our hierarchical multi-agent systems enable:

  • Real-time data analysis across disparate data sources
  • Coordinated report generation with specialized analysis agents
  • Executive summaries that combine multiple analytical perspectives

Conclusion

Our approach to multi-agent systems focuses on creating practical, business-oriented solutions that are modular, specialized, and maintainable. By carefully selecting the right architecture and communication patterns based on your business needs, we create AI systems that deliver measurable business value.

Contact us to discuss how our multi-agent implementation experience can help transform your business processes.