Multi-agent Systems

Multi-agent systems

Introduction

At our company, we define an agent as a system that uses an LLM to decide the control flow of an application. When working with clients, we often encounter complex requirements that a single agent cannot effectively handle due to challenges like:

Poor tool selection when an agent has too many tools
Complex context management across multiple domains
Need for specialized domain expertise in different areas

Our approach to multi-agent systems addresses these challenges by breaking applications into smaller, independent agents. Depending on client needs, these agents can range from simple prompt-LLM combinations to sophisticated ReAct agents with specialized capabilities.

Key Benefits for Our Clients

Modularity: Easier development, testing, and maintenance of complex systems
Specialization: Domain-expert agents that excel in specific business functions
Control: Explicit management of agent communication with clear boundaries
Scalability: Ability to grow systems incrementally as business needs evolve

The Big Tool Approach

While our multi-agent systems effectively address most complex business needs, we’ve identified scenarios where clients require a single agent to access a vast number of specialized tools. For these cases, we implement langgraph-bigtool, a solution that allows scaling to hundreds or thousands of tools within a single agent.

What is langgraph-bigtool?

langgraph-bigtool is a Python library we use to create LangGraph agents capable of accessing large tool libraries. Rather than overwhelming the context window with all available tools, it leverages LangGraph’s long-term memory store to search for and retrieve only the most relevant tools for a given task.

Basic Implementation

import uuid
from langchain.chat_models import init_chat_model
from langchain.embeddings import init_embeddings
from langgraph.store.memory import InMemoryStore
from langgraph_bigtool import create_agent

# Register tools with unique identifiers

tool_registry = {
str(uuid.uuid4()): tool
for tool in all_tools
}

# Index tool descriptions for retrieval

embeddings = init_embeddings("openai:text-embedding-3-small")
store = InMemoryStore(
index={
"embed": embeddings,
"dims": 1536,
"fields": ["description"],
}
)

# Store tool metadata for search

for tool_id, tool in tool_registry.items():
store.put(
("tools",),
tool_id,
{
"description": f"{tool.name}: {tool.description}",
},
)

# Create and compile agent

llm = init_chat_model("openai:gpt-4o-mini")
builder = create_agent(llm, tool_registry)
agent = builder.compile(store=store)

Key Capabilities

Scalable Tool Access: Agents can work with hundreds or thousands of tools without context overload
Intelligent Tool Selection: Tools are retrieved based on relevance to the current task
Customizable Retrieval: Custom retrieval functions can be implemented to match specific business logic
Enterprise-ready Storage: Supports both in-memory and production-ready Postgres backends

Custom Tool Retrieval Logic

For enterprise clients with complex tool categorization needs, we implement custom retrieval logic:

def retrieve_tools(
    category: Literal["billing", "service"],
) -> list[str]:
    """Get tools for a specific business domain."""
    if category == "billing":
        return ["payment_processor", "invoice_generator"]
    else:
        return ["ticket_creator", "service_lookup"]

builder = create_agent(
    llm, tool_registry, retrieve_tools_function=retrieve_tools
)

Our Multi-agent Architecture Approaches

Through our client implementations, we’ve developed three primary architectural patterns for multi-agent systems:

1. Network Architecture

We implement this model when business processes require flexible communication paths between different specialized functions. Each agent decides which other agent to call next based on the current context and requirements.

Network Example

from typing import Literal
from langchain_openai import ChatOpenAI
from langgraph.types import Command
from langgraph.graph import StateGraph, MessagesState, START, END

model = ChatOpenAI()

def agent_1(state: MessagesState) -> Command[Literal["agent_2", "agent_3", END]]:
response = model.invoke(...)
return Command(
goto=response["next_agent"],
update={"messages": [response["content"]]},
)

# Add more agents...

builder = StateGraph(MessagesState)
builder.add_node(agent_1)

# Add more nodes...

network = builder.compile()

2. Supervisor Architecture

This is our most commonly implemented architecture, where a central supervisor agent coordinates all other specialized agents. This provides clear control flow and is particularly effective for business workflows with well-defined processes.

Supervisor Example

def supervisor(state: MessagesState) -> Command[Literal["agent_1", "agent_2", END]]:
    response = model.invoke(...)
    return Command(goto=response["next_agent"])

def agent_1(state: MessagesState) -> Command[Literal["supervisor"]]:
    response = model.invoke(...)
    return Command(
        goto="supervisor",
        update={"messages": [response]},
    )

builder = StateGraph(MessagesState)
builder.add_node(supervisor)
builder.add_node(agent_1)
# Add more nodes...
supervisor = builder.compile()

3. Hierarchical Architecture

For our enterprise clients with complex organizational structures, we implement hierarchical multi-agent systems. This allows for teams of specialized agents managed by mid-level supervisors, all coordinated by an executive-level supervisor.

Hierarchical Example

# Team 1 - e.g., Customer Data Analysis
def team_1_supervisor(state: MessagesState):
    response = model.invoke(...)
    return Command(goto=response["next_agent"])

# Team 2 - e.g., Market Research

def team_2_supervisor(state: MessagesState):
response = model.invoke(...)
return Command(goto=response["next_agent"])

# Top-level supervisor

def top_level_supervisor(state: MessagesState):
response = model.invoke(...)
return Command(goto=response["next_team"])

builder = StateGraph(MessagesState)
builder.add_node(top_level_supervisor)
builder.add_node("team_1_graph", team_1_graph)
builder.add_node("team_2_graph", team_2_graph)

Our Communication Patterns

Through our implementations, we’ve developed two primary communication patterns that work effectively in production environments:

State-based Communication

In our LangGraph implementations, agents communicate through a shared graph state, which we typically configure as:

A common state schema for cross-functional agents
Different state schemas for specialized domain agents
A combination of shared and private states for sensitive information

def agent_with_shared_state(state: MessagesState):
    # Access and update shared state
    current_context = state["messages"]
    response = model.invoke(current_context)
    return {"messages": current_context + [response]}

Tool-based Communication

For systems requiring strict boundaries, we implement tool-based communication, where agents provide services to each other through well-defined interfaces:

def agent_as_tool(state: Annotated[dict, InjectedState]):
    response = model.invoke(state["current_query"])
    return response.content

tools = [agent_as_tool]
supervisor = create_react_agent(model, tools)

Implementation Best Practices

From our real-world client implementations, we’ve developed these best practices:

1. State Management

Implement proper memory management for enterprise-scale conversations
Choose appropriate state schemas based on business domain requirements
Design state cleanup mechanisms to prevent context bloat in long-running systems

2. Communication Protocol

Define clear interfaces between agents based on business function boundaries
Determine information sharing policies (full thought processes vs. conclusions)
Implement robust error handling with fallback mechanisms

3. System Design

Begin with the simplest effective architecture (usually supervisor-based)
Monitor agent performance with detailed logs and performance metrics
Scale by adding specialized agents rather than increasing individual agent complexity

Systematic Implementation Process

For each client implementation, we follow a structured approach: from business analysis and architecture design to testing and continuous monitoring.

Human-in-the-Loop Integration

Our multi-agent systems are designed to complement human capabilities, with clear handoff protocols between automated and human processes.

Case Studies

Our multi-agent systems have transformed business operations across various domains:

Customer Service Enhancement

We’ve implemented supervisor-based multi-agent systems with specialized agents for:

Information retrieval from knowledge bases
Customer query classification and prioritization
Problem-specific solution generation

Business Intelligence

Our hierarchical multi-agent systems enable:

Real-time data analysis across disparate data sources
Coordinated report generation with specialized analysis agents
Executive summaries that combine multiple analytical perspectives

Conclusion

Our approach to multi-agent systems focuses on creating practical, business-oriented solutions that are modular, specialized, and maintainable. By carefully selecting the right architecture and communication patterns based on your business needs, we create AI systems that deliver measurable business value.