Agent Approaches

AI Agent Approaches

In the previous sections, we learned:

How tools are made available to the agent in the system prompt.
How AI agents are systems that can ‘reason’, plan, and interact with their environment.

In this section, we’ll explore the complete AI Agent Workflow, a cycle we defined as Thought-Action-Observation, and then dive deeper into each of these steps.

The Core Components

Agents work in a continuous cycle of: thinking (Thought) → acting (Act) and observing (Observe).

Let’s break down these actions together:

Thought: The LLM part of the Agent decides what the next step should be.
Action: The agent takes an action, by calling the tools with the associated arguments.
Observation: The model reflects on the response from the tool.

The Thought-Action-Observation Cycle

The three components work together in a continuous loop. To use an analogy from programming, the agent uses a while loop: the loop continues until the objective of the agent has been fulfilled.

Visually, it looks like this:

Think, Act, Observe cycle

In many Agent frameworks, the rules and guidelines are embedded directly into the system prompt, ensuring that every cycle adheres to a defined logic.

In a simplified version, our system prompt may look like this:

You are an AI assistant that helps users with their questions.
You have access to the following tools:
- search_movies(genre, release_years, limit): Returns movies matching the specified genre and release years.

To use a tool, output in this format:
Thought: <your thinking process>
Action: {"action": "tool_name", "action_input": {"param": "value"}}
Observation: <result of the tool>

If you know the final answer, respond with:
Thought: <your reasoning>
Final answer: <your response to the user>

We see here that in the System Message we defined:

The Agent’s behavior.
The Tools our Agent has access to, as we described in the previous section.
The Thought-Action-Observation Cycle, that we bake into the LLM instructions.

Cineast, the Movie Recommendation Agent: An Example

We created Cineast, the Movie Recommendation Agent. A user asks Cineast: “Can you recommend a sci-fi movie from the last 5 years?”

Cineast’s job is to answer this query using a movie database API tool. Here’s how the cycle unfolds:

Thought

Upon receiving the query, Cineast’s internal dialogue might be:

“The user is looking for a recent sci-fi movie recommendation. I have access to a tool that searches movie databases. First, I need to query the database with the appropriate genre and time period parameters.”

This step shows the agent breaking the problem into steps: first, gathering the necessary data with the right parameters.

Action

Based on its reasoning and the fact that Cineast knows about a search_movies tool, Cineast prepares a JSON-formatted command that calls the movie database API:

{
  "action": "search_movies",
  "action_input": {
    "genre": "sci-fi",
    "release_years": "2018-2023",
    "limit": 5
  }
}

Here, the action clearly specifies which tool to call (e.g., search_movies) and what parameters to pass (the genre, release years range, and a limit on results).

Observation

After the tool call, Cineast receives an observation. This might be the raw movie data from the API such as:

“Search results: 1) Dune (2021) - Rating: 8.0/10, 2) Everything Everywhere All at Once (2022) - Rating: 8.9/10, 3) Tenet (2020) - Rating: 7.3/10, 4) The Adam Project (2022) - Rating: 6.7/10, 5) Interstellar (2018) - Rating: 8.6/10”

This observation is then added to the prompt as additional context. It functions as real-world feedback, confirming whether the action succeeded and providing the needed details.

Updated Thought

With the observation in hand, Cineast updates its internal reasoning:

“Now that I have the movie search results, I can analyze them to provide a personalized recommendation. ‘Everything Everywhere All at Once’ has the highest rating and is relatively recent, so that seems like the best recommendation.”

Final Action

Cineast then generates a final response:

Thought: I have the movie data now. Looking at the ratings and release dates, 'Everything Everywhere All at Once' from 2022 has the highest rating at 8.9/10 and fits the sci-fi genre requirement.

Final answer: Based on your request for a recent sci-fi movie, I recommend "Everything Everywhere All at Once" (2022). It has an excellent rating of 8.9/10 and features innovative sci-fi concepts with a multiverse storyline.

This final action sends the personalized recommendation back to the user, closing the loop.

Insights from the Example

What we see in this example:

Agents iterate through a loop until the objective is fulfilled: Cineast’s process is cyclical. It starts with a thought, then acts by calling a tool, and finally observes the outcome. If the observation had indicated an error or incomplete data, Cineast could have re-entered the cycle to correct its approach.
Tool Integration: The ability to call a tool (like a movie database API) enables Cineast to go beyond static knowledge and retrieve up-to-date information, an essential aspect of many AI Agents.
Dynamic Adaptation: Each cycle allows the agent to incorporate fresh information (observations) into its reasoning (thought), ensuring that the final answer is well-informed and accurate.

This example showcases the core concept behind the ReAct cycle: the interplay of Thought, Action, and Observation empowers AI agents to solve complex tasks iteratively.

Internal Reasoning

Thoughts represent the Agent’s internal reasoning and planning processes to solve the task. This utilizes the agent’s Large Language Model (LLM) capacity to analyze information when presented in its prompt.

Think of it as the agent’s internal dialogue, where it considers the task at hand and strategizes its approach.

Here are some examples of common thoughts:

Type of Thought	Example
Planning	”I need to break this task into three steps: 1) gather data, 2) analyze trends, 3) generate report”
Analysis	”Based on the error message, the issue appears to be with the database connection parameters”
Decision Making	”Given the user’s budget constraints, I should recommend the mid-tier option”
Problem Solving	”To optimize this code, I should first profile it to identify bottlenecks”
Memory Integration	”The user mentioned their preference for Python earlier, so I’ll provide examples in Python”
Self-Reflection	”My last approach didn’t work well, I should try a different strategy”
Goal Setting	”To complete this task, I need to first establish the acceptance criteria”
Prioritization	”The security vulnerability should be addressed before adding new features”

The Re-Act Approach

ReAct Approach

A key method is the ReAct approach, which is the concatenation of “Reasoning” (Think) with “Acting” (Act).

ReAct is a simple prompting technique that appends “Let’s think step by step” before letting the LLM decode the next tokens. Prompting the model to think “step by step” encourages the decoding process toward next tokens that generate a plan, rather than a final solution, since the model is encouraged to decompose the problem into sub-tasks.

This allows the model to consider sub-steps in more detail, which in general leads to fewer errors than trying to generate the final solution directly.

We have recently seen a lot of interest for reasoning strategies. This is what’s behind models like Deepseek R1 or OpenAI’s o1, which have been fine-tuned to “think before answering”. These models have been trained to always include specific thinking sections (enclosed between <think> and </think> special tokens). This is not just a prompting technique like ReAct, but a training method where the model learns to generate these sections after analyzing thousands of examples that show what we expect it to do.

Actions: Enabling the Agent to Engage with Its Environment

Actions are the concrete steps an AI agent takes to interact with its environment. Whether it’s browsing the web for information or controlling a physical device, each action is a deliberate operation executed by the agent.

For example, an agent assisting with customer service might retrieve customer data, offer support articles, or transfer issues to a human representative.

Types of Agent Actions

There are multiple types of Agents that take actions differently:

Type of Agent	Description
JSON Agent	The Action to take is specified in JSON format.
Code Agent	The Agent writes a code block that is interpreted externally.
Function-calling Agent	A subcategory of the JSON Agent which has been fine-tuned to generate a new message for each action.

Actions themselves can serve many purposes:

Type of Action	Description
Information Gathering	Performing web searches, querying databases, or retrieving documents.
Tool Usage	Making API calls, running calculations, and executing code.
Environment Interaction	Manipulating digital interfaces or controlling physical devices.
Communication	Engaging with users via chat or collaborating with other agents.

The Stop and Parse Approach

One key method for implementing actions is the stop and parse approach. This method ensures that the agent’s output is structured and predictable:

Generation in a Structured Format: The agent outputs its intended action in a clear, predetermined format (JSON or code).
Halting Further Generation: Once the action is complete, the agent stops generating additional tokens. This prevents extra or erroneous output.
Parsing the Output: An external parser reads the formatted action, determines which Tool to call, and extracts the required parameters.

For example, an agent needing to check the weather might output:

Thought: I need to check the current weather for New York.
Action: {
  "action": "get_weather",
  "action_input": {"location": "New York"}
}

Code Agents

An alternative approach is using Code Agents. Instead of outputting a simple JSON object, a Code Agent generates an executable code block—typically in a high-level language like Python.

This approach offers several advantages:

Expressiveness: Code can naturally represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON.
Modularity and Reusability: Generated code can include functions and modules that are reusable across different actions or tasks.
Enhanced Debuggability: With a well-defined programming syntax, code errors are often easier to detect and correct.
Direct Integration: Code Agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making.

For example, a Code Agent tasked with fetching the weather might generate the following Python snippet:

# Code Agent Example: Retrieve Weather Information
def get_weather(city):
    import requests
    api_url = f"https://api.weather.com/v1/location/{city}?apiKey=YOUR_API_KEY"
    response = requests.get(api_url)
    if response.status_code == 200:
        data = response.json()
        return data.get("weather", "No weather information available")
    else:
        return "Error: Unable to fetch weather data."

# Execute the function and prepare the final answer
result = get_weather("New York")
final_answer = f"The current weather in New York is: {result}"
print(final_answer)

Observe: Integrating Feedback to Reflect and Adapt

Observations are how an Agent perceives the consequences of its actions. They provide crucial information that fuels the Agent’s thought process and guides future actions.

They are signals from the environment—whether it’s data from an API, error messages, or system logs—that guide the next cycle of thought.

In the observation phase, the agent:

Collects Feedback: Receives data or confirmation that its action was successful (or not).
Appends Results: Integrates the new information into its existing context, effectively updating its memory.
Adapts its Strategy: Uses this updated context to refine subsequent thoughts and actions.

For example, if a weather API returns the data “partly cloudy, 15°C, 60% humidity”, this observation is appended to the agent’s memory (at the end of the prompt). The Agent then uses it to decide whether additional information is needed or if it’s ready to provide a final answer.

These observations can take many forms, from reading webpage text to monitoring a robot arm’s position:

Type of Observation	Example
System Feedback	Error messages, success notifications, status codes
Data Changes	Database updates, file system modifications, state changes
Environmental Data	Sensor readings, system metrics, resource usage
Response Analysis	API responses, query results, computation outputs
Time-based Events	Deadlines reached, scheduled tasks completed

How Are the Results Appended?

After performing an action, the framework follows these steps in order:

Parse the action to identify the function(s) to call and the argument(s) to use.
Execute the action.
Append the result as an Observation.

The Agent’s thoughts are responsible for accessing current observations and deciding what the next action(s) should be. Through this process, the agent can break down complex problems into smaller, more manageable steps, reflect on past experiences, and continuously adjust its plans based on new information.

By understanding and applying these principles, you can design agents that not only reason about their tasks but also effectively utilize external tools to complete them, all while continuously refining their output based on environmental feedback.

For example, an agent needing to find movie recommendations might output:

Thought: I need to search for sci-fi movies released in the last 5 years.
Action: {
  "action": "search_movies",
  "action_input": {"genre": "sci-fi", "release_years": "2018-2023", "limit": 5}
}

Code Agents

An alternative approach is using Code Agents. Instead of outputting a simple JSON object, a Code Agent generates an executable code block—typically in a high-level language like Python.

This approach offers several advantages:

Expressiveness: Code can naturally represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON.
Modularity and Reusability: Generated code can include functions and modules that are reusable across different actions or tasks.
Enhanced Debuggability: With a well-defined programming syntax, code errors are often easier to detect and correct.
Direct Integration: Code Agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making.

For example, a Code Agent tasked with finding movie recommendations might generate the following Python snippet:

# Code Agent Example: Retrieve Movie Recommendations
def search_movies(genre, years_range, limit=5):
    import requests
    api_url = f"https://api.moviedb.com/v1/search?apiKey=YOUR_API_KEY"
    params = {
        "genre": genre,
        "release_years": years_range,
        "limit": limit
    }
    response = requests.get(api_url, params=params)
    if response.status_code == 200:
        data = response.json()
        return data.get("results", "No movies found matching your criteria")
    else:
        return "Error: Unable to fetch movie data."

# Execute the function and prepare the final answer
results = search_movies("sci-fi", "2018-2023", 5)
top_movie = max(results, key=lambda x: x["rating"])
final_answer = f'The best sci-fi movie from recent years is "{top_movie["title"]}" ({top_movie["year"]}) with a rating of {top_movie["rating"]}/10'
print(final_answer)

For example, if a movie database API returns data like “Search results: 1) Dune (2021) - Rating: 8.0/10, 2) Everything Everywhere All at Once (2022) - Rating: 8.9/10…”, this observation is appended to the agent’s memory (at the end of the prompt). The Agent then uses it to decide whether additional information is needed or if it’s ready to provide a final answer.