Chapter 6

Parallel Sub-Agents

The New For Loop

Part III — The New Control Flow 9 sections

The for loop assumes the operation being applied is uniform — the same function, the same logic, applied identically to every element. This assumption breaks down when inputs are heterogeneous and unstructured: each element requiring different handling, the "operation" must adapt to what it finds.

The Parallel Sub-Agent pattern is the evolution of the for loop for intelligence-intensive work. Instead of applying the same function to each element, you dispatch an intelligent agent to handle each element — an agent that can reason about what it finds, adapt its approach, recover from failures, and produce a result appropriate for that specific input.


6.1   The For Loop: Power and Limits

Where it shines — processing structured, homogeneous data: every order has the same structure, calculate_tax takes a known input type and returns a known output type. The loop is correct, efficient, and easy to reason about.

Where it strains — processing unstructured, heterogeneous data: 50 annual reports in various formats (PDF, HTML, Excel, or 404). Some in English, some in Japanese. Some are 20 pages, some are 300. The loop structure is there, but the loop body has exploded into conditional handling paths that only cover anticipated edge cases. Every unanticipated format is a potential failure.

This is the problem that Parallel Sub-Agents solve: not iteration, but intelligent delegation at scale.


6.2   Parallel Sub-Agents: Delegation at Scale

A Parallel Sub-Agent pattern works as follows: a Manager Agent receives a collection of tasks and decomposes them into independent work items. For each work item, the Manager spawns a Sub-Agent with focused instructions. Sub-Agents execute concurrently. A Synthesizer Agent collects results and merges them into a coherent final output.

Fig 6.1 — Parallel Sub-Agent Architecture: Fan-Out / Fan-In
Orchestrator Decomposes task into N sub-tasks fan-out (parallel) Sub-Agent A Task slice 1 Sub-Agent B Task slice 2 Sub-Agent C Task slice 3 fan-in (collect) Aggregator Merges + ranks sub-results Final Output Sequential: T_total = T_A + T_B + T_C (e.g., 3 × 8s = 24s) Parallel: T_total = max(T_A, T_B, T_C) (e.g., max(8,7,9) = 9s) — 2.7× faster

Fan-out parallelises independent sub-tasks; fan-in collects and synthesises results. The speed gain equals the sequential sum minus the longest individual task — only valid when tasks are genuinely independent.


6.3   The Evolution of Iteration

FeatureTraditional For LoopParallel Sub-Agents
Execution modelSequential (or simple parallelism)Concurrent with independent reasoning per item
Logic typeStatic — same function for every itemDynamic — each agent adapts to its specific input
Data requirementsHomogeneous, structured, typedHeterogeneous, unstructured, variable
Failure modeOne unhandled exception can break the loopIndividual agents retry, skip, or escalate independently
Output typeUniform — same type for every elementDiverse — each agent may produce different structure
DebuggingIdentify the failing element; check the functionIdentify the failing agent; read its reasoning trace

6.4   Semantic Map-Reduce

Parallel Sub-Agents implement a Semantic Map-Reduce. In classic MapReduce (Google, 2004), the Map function is pure and deterministic. In Semantic Map-Reduce, the Map phase is a reasoning process and the Reduce phase (Synthesize) is also a reasoning process — resolving conflicts, identifying gaps, drawing cross-item inferences.

Fig 6.2 — Classic MapReduce vs. Semantic Map-Reduce
Classic MapReduce Chunk A Chunk B Chunk C ↓ map(count) ↓ map(count) ↓ map(count) {w:3,x:1} {w:2,y:4} {x:1,z:2} reduce: sum keys {w:5,x:2,y:4,z:2} Semantic Map-Reduce Doc A Doc B Doc C ↓ LLM(extract) ↓ LLM(extract) ↓ LLM(extract) Risks: A1,A2 Risks: B1,B3 Risks: A2,C1 LLM(deduplicate + rank) Consolidated risk report Key diff: Map output is natural language, not key-value pairs

Semantic Map-Reduce replaces deterministic aggregation functions with an LLM reduce step that can deduplicate, synthesise, and rank natural language outputs — enabling analysis tasks that have no deterministic equivalent.


6.5   Designing the Manager Agent

The Manager Agent's sole job is decomposition and delegation. It receives a complex task, identifies the independent sub-tasks, constructs the appropriate instruction for each sub-agent, and dispatches them. The Manager should not attempt to do the analysis itself — this adheres to the Single Responsibility Principle (Chapter 9).


6.6   The Consensus Problem: When Agents Disagree

When two or more sub-agents, given the same or equivalent inputs, produce different answers — this is divergent output. Unlike a traditional for loop (identical function calls always return identical values for identical inputs), parallel agents may reach different conclusions about the same document.

Fig 6.3 — The Divergent Output Problem
Same Prompt "Summarise the risks" Agent A output "3 key risks: liquidity, credit, ops" Agent B output "Primary risk: regulatory exposure in APAC" Agent C output "No significant risks identified" ⚠ Divergent ≠ incorrect — divergence is the expected behavior of a probabilistic system Fix: structured output schema forces agents onto commensurable answer spaces that an aggregator can reconcile

Three agents given the same prompt produce structurally different answers — not because any is wrong, but because the task under-specified the output format. The solution is a schema contract, not a prompt tweak.


6.7   Designing for Parallelism: The Independence Criterion

The prerequisite for the parallel sub-agent pattern is independence: sub-agent A must be able to complete its work without requiring any output from sub-agent B.

Fig 6.4 — Independence Check: Can These Tasks Run in Parallel?
Dependency Type Can Parallelize? Strategy No shared data, no ordering constraint ✓ Yes — fully parallel Standard fan-out Read same data, write different keys ⚡ Mostly — with locking Read-only snapshot at start Task B needs Task A's output ✗ No — sequential only Pipeline, not parallel Both write the same shared state ✗ No — race condition Serialise or partition state Rule: model the dependency graph before choosing fan-out. Wrong parallelism produces non-deterministic errors, not performance.

Parallelism is only valid when tasks are independent. The dependency graph determines the architecture — not the other way around.

The practical design rule: decompose tasks into the largest possible units of independence. Run all independent tasks in parallel, then synthesize at the natural aggregation boundaries.


6.8   When NOT to Parallelize

Sequential dependencies: When task B depends on the output of task A, parallelism is structurally impossible. Shared mutable state: Two agents writing conflicting updates to a shared scratchpad is just as problematic as two threads modifying shared memory without locks. Cost amplification: 50 sub-agents each consuming 2,000 input tokens is 100,000 input tokens — potentially orders of magnitude more expensive than a single sequential pass. Latency does not always improve: API rate limits can negate parallelism benefits entirely.


6.9   Chapter Summary

The for loop iterates. Parallel sub-agents delegate. Use parallel sub-agents when inputs are heterogeneous and require adaptive handling. Apply the Semantic Map-Reduce pattern: parallel reasoning (map) followed by intelligent synthesis (reduce). Design the Synthesizer to handle divergent outputs explicitly. Apply parallelism only where tasks are genuinely independent.

Core Principle — Chapter 6

A for loop applies the same logic repeatedly. Parallel sub-agents apply judgment repeatedly. This is the difference between automation and delegation. The programmer who masters this becomes an architect of reasoning at scale.