The New While
The while loop is the foundation of persistence in programming. Without it, a program executes once and terminates. With it, a program can keep going — iterating on a problem, retrying on failure, waiting for a condition to be met. Correctness requires a well-defined stopping condition — one that will eventually evaluate to False. Without it, the loop runs forever.
The Agentic Loop is the while loop of the reasoning world. An agent executes a cycle: it thinks about what to do, takes an action, observes the result, and decides whether to continue or stop. This cycle — formalized as the ReAct pattern (Reason + Act) — is the dominant execution model in modern agent frameworks.
But where the while loop's stopping condition is a boolean expression — deterministic, evaluable, unambiguous — the agentic loop's stopping condition is a reasoned assessment: has sufficient evidence been gathered to produce a correct answer? This shift from mechanical evaluation to reasoned judgment is both the power and the primary source of failure in agentic loops.
The ReAct pattern formalizes the three-step cycle that characterizes effective agent behavior:
Reason: The agent produces a chain-of-thought reasoning step — considers the current state, the information gathered, what is still missing, and identifies the next action. Act: Based on its reasoning, the agent executes an action — typically a tool call to an external system. Observe: The agent receives the result and incorporates it into its understanding of the task state.
ReAct (Reason + Act) is the canonical agentic loop. The key discipline: the stop condition is a reasoned assessment of whether sufficient evidence has been gathered — not a fixed step count.
A concrete 3-iteration example for a customer support query about order #99:
--- Iteration 1 ---
Reason: I need to look up order #99 to understand its status.
Act: call get_order_status(order_id="99")
Observe: {"status": "Delayed", "original_eta": "2026-02-18",
"new_eta": "2026-02-28", "reason": "weather delay"}
--- Iteration 2 ---
Reason: The order is delayed due to weather. I should check
if the user qualifies for delay compensation.
Act: call get_compensation_policy(delay_reason="weather")
Observe: {"eligible": false, "reason": "weather delays are force majeure"}
--- Iteration 3 ---
Reason: I have all the information I need. New ETA is Feb 28,
user is not eligible for compensation. I can now answer.
Act: [Final Answer — exits loop]
while Loop Component | ReAct Loop Equivalent | Notes |
|---|---|---|
while condition: | "Does the task feel complete?" | Boolean vs. judgment — the critical difference |
| Loop body | Reason → Act → Observe cycle | One iteration of reasoning + one tool call |
| Loop variable | Agent's scratchpad / reasoning trace | Accumulates context across iterations |
break / return | Final Answer / Stop Token | Explicit exit signal |
| Infinite loop | Agent that cannot determine completeness | Most common failure mode |
| Maximum iterations guard | max_iterations parameter | Safety mechanism, not logic |
This is the hardest engineering challenge in the agentic loop. Traditional: while len(queue) > 0: — unambiguous, terminates when queue is empty. Agentic: "Has the agent gathered enough information and reasoning to produce a complete, accurate, and helpful response?" — not evaluable by a boolean expression. Requires judgment about completeness, accuracy, and relevance.
Failure mode 1: Premature termination. The agent declares "final answer" before gathering enough information — the agentic equivalent of an off-by-one error.
Failure mode 2: Non-termination. The agent loops indefinitely, unable to determine that it has gathered sufficient information — the agentic infinite loop.
The practical solution: Always impose an explicit maximum iteration count as a safety guard, separate from the agent's own stopping logic:
MAX_ITERATIONS = 15
for iteration in range(MAX_ITERATIONS):
action = agent.step(context)
if action.type == "final_answer":
return action.content # Natural termination
result = execute_tool(action)
context.append(result)
# Safety exit: max iterations reached
return agent.produce_best_effort_answer(context)
The max_iterations guard is not the agent's stopping condition — it is a circuit breaker that prevents a misbehaving loop from running indefinitely.
The Reflexion pattern adds a self-evaluation step to the loop: at select points, the agent explicitly evaluates its own output against the task requirements and decides whether to retry — before the output reaches the user.
Reflexion adds self-evaluation to the agentic loop. The key insight: improvement notes fed into the next attempt are far more effective than simply repeating the original prompt.
The self-correction pattern can be externalized: instead of one agent critiquing its own output, two agents are deployed in a loop — an Actor that produces output and a Critic that evaluates it.
The Actor-Critic pattern separates execution from evaluation at the architectural level. Unlike Reflexion (self-critique), the Critic runs as a separate agent with separate instructions — enabling genuinely independent evaluation.
The Actor-Critic loop externalizes the judgment of quality into a separate agent with a focused evaluation task, which is often more reliable than self-evaluation — agents tend to be lenient on their own output.
A special case of the agentic loop: an agent that determines a sub-task is too complex to handle directly and spawns a new agent to handle it — potentially a copy of itself with a refined scope.
def agent_call(task: str, depth: int = 0) -> str:
MAX_DEPTH = 5
result = llm.call(system_prompt=AGENT_PROMPT, user_message=task)
if result.requires_subtask:
if depth >= MAX_DEPTH:
return "Task too complex — escalating."
return agent_call(result.subtask, depth=depth + 1)
return result.answer
The critical safety constraint: explicit depth limits. Without a base case or depth guard, a recursively spawning agent creates an unbounded tree of sub-agents — the agentic equivalent of a stack overflow.
The agentic loop — Reason → Act → Observe — is the fundamental mechanism by which agents exhibit persistence, adaptability, and self-correction. The loop body is a full reasoning cycle. The stopping condition is a judgment, not a boolean. Premature termination and non-termination are the two primary failure modes. Self-correction patterns (Reflexion, Actor-Critic) improve output quality by externalizing evaluation. Recursive agents require explicit depth limits.
A while loop terminates when a condition is false. An agentic loop terminates when the agent believes the task is done. This distinction — between checking a fact and making a judgment — is the fundamental challenge of agentic control flow.