Chapter 7

The Agentic Loop

The New While

Part III — The New Control Flow 7 sections

The while loop is the foundation of persistence in programming. Without it, a program executes once and terminates. With it, a program can keep going — iterating on a problem, retrying on failure, waiting for a condition to be met. Correctness requires a well-defined stopping condition — one that will eventually evaluate to False. Without it, the loop runs forever.

The Agentic Loop is the while loop of the reasoning world. An agent executes a cycle: it thinks about what to do, takes an action, observes the result, and decides whether to continue or stop. This cycle — formalized as the ReAct pattern (Reason + Act) — is the dominant execution model in modern agent frameworks.

But where the while loop's stopping condition is a boolean expression — deterministic, evaluable, unambiguous — the agentic loop's stopping condition is a reasoned assessment: has sufficient evidence been gathered to produce a correct answer? This shift from mechanical evaluation to reasoned judgment is both the power and the primary source of failure in agentic loops.


7.1   The ReAct Loop: Reason → Act → Observe

The ReAct pattern formalizes the three-step cycle that characterizes effective agent behavior:

Reason: The agent produces a chain-of-thought reasoning step — considers the current state, the information gathered, what is still missing, and identifies the next action. Act: Based on its reasoning, the agent executes an action — typically a tool call to an external system. Observe: The agent receives the result and incorporates it into its understanding of the task state.

Fig 7.1 — The ReAct Loop: Detailed Execution Cycle
LLM Reason + Decide ① OBSERVE Current state + tool feedback injected ② THINK (chain-of-thought) ③ ACT Call tool / write output / update state TOOLS Search / Code exec / APIs / File ops tool result → observation (loop) STOP CONDITION Evidence gathered? Task complete? exits loop Max iterations: set explicitly — prevent runaway loops

ReAct (Reason + Act) is the canonical agentic loop. The key discipline: the stop condition is a reasoned assessment of whether sufficient evidence has been gathered — not a fixed step count.

A concrete 3-iteration example for a customer support query about order #99:

ReAct Loop — Three Iterations
--- Iteration 1 ---
Reason: I need to look up order #99 to understand its status.
Act:    call get_order_status(order_id="99")
Observe: {"status": "Delayed", "original_eta": "2026-02-18",
          "new_eta": "2026-02-28", "reason": "weather delay"}

--- Iteration 2 ---
Reason: The order is delayed due to weather. I should check
        if the user qualifies for delay compensation.
Act:    call get_compensation_policy(delay_reason="weather")
Observe: {"eligible": false, "reason": "weather delays are force majeure"}

--- Iteration 3 ---
Reason: I have all the information I need. New ETA is Feb 28,
        user is not eligible for compensation. I can now answer.
Act:    [Final Answer — exits loop]

7.2   Mapping While → ReAct

while Loop ComponentReAct Loop EquivalentNotes
while condition:"Does the task feel complete?"Boolean vs. judgment — the critical difference
Loop bodyReason → Act → Observe cycleOne iteration of reasoning + one tool call
Loop variableAgent's scratchpad / reasoning traceAccumulates context across iterations
break / returnFinal Answer / Stop TokenExplicit exit signal
Infinite loopAgent that cannot determine completenessMost common failure mode
Maximum iterations guardmax_iterations parameterSafety mechanism, not logic

7.3   The Stopping Condition Problem

This is the hardest engineering challenge in the agentic loop. Traditional: while len(queue) > 0: — unambiguous, terminates when queue is empty. Agentic: "Has the agent gathered enough information and reasoning to produce a complete, accurate, and helpful response?" — not evaluable by a boolean expression. Requires judgment about completeness, accuracy, and relevance.

Failure mode 1: Premature termination. The agent declares "final answer" before gathering enough information — the agentic equivalent of an off-by-one error.

Failure mode 2: Non-termination. The agent loops indefinitely, unable to determine that it has gathered sufficient information — the agentic infinite loop.

The practical solution: Always impose an explicit maximum iteration count as a safety guard, separate from the agent's own stopping logic:

Maximum Iterations Guard
MAX_ITERATIONS = 15

for iteration in range(MAX_ITERATIONS):
    action = agent.step(context)
    
    if action.type == "final_answer":
        return action.content  # Natural termination
    
    result = execute_tool(action)
    context.append(result)

# Safety exit: max iterations reached
return agent.produce_best_effort_answer(context)

The max_iterations guard is not the agent's stopping condition — it is a circuit breaker that prevents a misbehaving loop from running indefinitely.


7.4   Self-Correction Within the Loop

The Reflexion pattern adds a self-evaluation step to the loop: at select points, the agent explicitly evaluates its own output against the task requirements and decides whether to retry — before the output reaches the user.

Fig 7.2 — Reflexion Pattern Within the Agentic Loop
Actor Attempts task Produces output v1 Evaluator Scores output against criteria Produces: score + notes Reflector Diagnoses failure Generates targeted improvement notes ← retry with improvement notes (score below threshold) pass ✓ Final Output score ≥ threshold ⚠ Always set max_retries — Reflexion without a ceiling loops forever

Reflexion adds self-evaluation to the agentic loop. The key insight: improvement notes fed into the next attempt are far more effective than simply repeating the original prompt.


7.5   Multi-Agent Loops: Actor-Critic

The self-correction pattern can be externalized: instead of one agent critiquing its own output, two agents are deployed in a loop — an Actor that produces output and a Critic that evaluates it.

Fig 7.3 — Actor-Critic Multi-Agent Loop
Shared Memory / State ACTOR Optimises for task reward Proposes actions Executes approved steps CRITIC Evaluates proposed action against constraints Approve / Reject + reason reads state proposed action approve / reject + feedback writes verdict Use case: high-stakes actions (financial transactions, code deployment) where a second independent agent as safety gate provides real security value

The Actor-Critic pattern separates execution from evaluation at the architectural level. Unlike Reflexion (self-critique), the Critic runs as a separate agent with separate instructions — enabling genuinely independent evaluation.

The Actor-Critic loop externalizes the judgment of quality into a separate agent with a focused evaluation task, which is often more reliable than self-evaluation — agents tend to be lenient on their own output.


7.6   Recursion in Agentic Systems

A special case of the agentic loop: an agent that determines a sub-task is too complex to handle directly and spawns a new agent to handle it — potentially a copy of itself with a refined scope.

Recursive Agent with Depth Guard
def agent_call(task: str, depth: int = 0) -> str:
    MAX_DEPTH = 5
    
    result = llm.call(system_prompt=AGENT_PROMPT, user_message=task)
    
    if result.requires_subtask:
        if depth >= MAX_DEPTH:
            return "Task too complex — escalating."
        return agent_call(result.subtask, depth=depth + 1)
    
    return result.answer

The critical safety constraint: explicit depth limits. Without a base case or depth guard, a recursively spawning agent creates an unbounded tree of sub-agents — the agentic equivalent of a stack overflow.


7.7   Chapter Summary

The agentic loop — Reason → Act → Observe — is the fundamental mechanism by which agents exhibit persistence, adaptability, and self-correction. The loop body is a full reasoning cycle. The stopping condition is a judgment, not a boolean. Premature termination and non-termination are the two primary failure modes. Self-correction patterns (Reflexion, Actor-Critic) improve output quality by externalizing evaluation. Recursive agents require explicit depth limits.

Core Principle — Chapter 7

A while loop terminates when a condition is false. An agentic loop terminates when the agent believes the task is done. This distinction — between checking a fact and making a judgment — is the fundamental challenge of agentic control flow.