Chapter 2

The Fundamental Trilogy

Prompt, LLM, and Data

Part I — The New Syntax 7 sections

Every computing system, at its foundation, rests on three pillars: a language to express instructions, an engine to execute them, and data to operate on. These three elements are not optional or separable. Remove any one of them and you do not have a computing system — you have a component waiting to be assembled.

Traditional software engineering has its own trilogy: Source code (the language), the compiler or runtime (the execution engine), and the file system and databases (the data layer). Agentic programming has an equivalent trilogy that maps to these three pillars with near-perfect structural correspondence. Understanding this mapping is the diagnostic framework you will return to every time something goes wrong.

Fig 2.1 — The Fundamental Trilogy: Side-by-Side Comparison
Traditional Computing Source Code Compiler/Runtime File System / DB executes reads stores/retrieves Agentic Computing Prompt LLM Context / RAG compiles injects grounds / retrieves

The structure is identical. The implementation is entirely different. Every agentic failure traces to exactly one vertex of the right triangle.


2.1   Axiom 1: The Prompt is the Programming Language

A programming language is a formal system for expressing instructions to a machine. The prompt is a programming language by this definition — with one critical difference. Its "grammar" is not a formal specification; it is the statistical distribution of human language that the model was trained on. Its "compiler" does not check syntax; it infers semantics. And the mapping from prompt to behavior is probabilistic rather than deterministic.

This does not make the prompt a lesser language. It makes it a fundamentally different one.

The Structure of a Well-Formed Prompt

A well-formed prompt has identifiable structural components analogous to conventional source code:

System Identity — Who is this agent? Analogous to a class definition. Context Injection — What does the agent need to know right now? Analogous to method arguments. The Task — What should the agent do? The imperative statement, equivalent to the function body. Output Constraints — What format must the output satisfy? The return type declaration.

Prompt Ambiguity Bug — An instruction in a prompt that has more than one reasonable interpretation, causing the model to select one interpretation when the programmer intended another. The output is syntactically valid but semantically wrong.

In a traditional language, a syntax error is immediately visible. In prompt programming, there are no syntax errors — the LLM will always produce some output, regardless of how poorly the prompt is written. The "syntax error" of the prompt is ambiguity, and it is silent.


2.2   Axiom 2: The LLM is the Compiler

A compiler's job is translation: it takes source code written in a human-readable language and produces executable instructions. The LLM performs the same function. It takes the prompt — the source code — and produces output: text, structured data, tool calls, or chains of reasoning.

Fig 2.2 — Deterministic vs. Probabilistic Compilation
Deterministic (Traditional) Source Code + Input Compiler / Interpreter Output: always identical Probabilistic (Agentic) Prompt + Context LLM (probabilistic) Output A Output B Output C probable outputs — same on average, variable per call

The LLM is not buggy when outputs vary — it is probabilistic by design. Engineering for this means specifying intent with enough precision that the distribution of outputs is acceptable, not just the mean.

Model Versions as Compiler Versions

Software engineers pin the compiler version in production and test upgrades explicitly. The same principle applies to LLMs. When a model is updated, the "compiler" has changed. A prompt that produced reliable JSON output under one model version may produce inconsistent formatting under the next.

Practical implication: In production, pin the model version identifier (e.g., gpt-4o-2024-08-06, not gpt-4o). Test new model versions against your eval suite before upgrading. Treat a model upgrade with the same rigor as a dependency upgrade.


2.3   Axiom 3: Data is the New File System

Traditional programs read from files, query databases, receive network responses. In agentic systems, data plays the same role — but the mechanism of access has changed entirely.

The Three Data Access Patterns

Eager Injection (Context Stuffing) — Include all relevant data directly in the prompt. Analogous to passing all function arguments by value. Fast and simple, but consumes context window tokens and doesn't scale to large data volumes.

Lazy Retrieval (RAG) — Store data in a vector database and retrieve only the semantically relevant chunks at query time. Analogous to a database query: the function knows how to fetch what it needs at runtime.

Fig 2.3 — RAG Data Access Pattern
User Query "Find revenue risks" Retriever Vector similarity search / keyword Document Store 📄 Annual Reports 📄 SEC Filings 📄 News Articles 📄 Analyst Notes top-k chunks Context Window [System Prompt] [Retrieved Chunks] [User Query] → injected into LLM LLM responds ① Query ② Retrieve ③ Match ④ Assemble ⑤ Reason

RAG (Retrieval-Augmented Generation): the agent does not “know” the documents — it retrieves the most relevant chunks at inference time and reasons over them as injected context.

Tool-Called Retrieval — The agent itself decides, mid-reasoning, that it needs specific data and calls a tool to retrieve it. Analogous to a function that issues a database query at runtime based on its input.

Tool Definition Schema
tools = [{
    "name": "get_order_status",
    "description": "Retrieves the current status of an order by order ID",
    "parameters": {
        "type": "object",
        "properties": {
            "order_id": {"type": "string"}
        }
    }
}]

The Staleness Problem

Data baked into a prompt at training time is frozen. Data injected at runtime is as fresh as the last database write. Data retrieved via live tool calls is as fresh as the underlying system. Choosing the right data access pattern is not just a performance decision — it is a correctness decision.


2.4   How the Trilogy Interacts

The three elements form a cycle that runs for every agent invocation. The prompt tells the LLM what to do. The LLM determines what data it needs and calls tools to retrieve it. The retrieved data is added to the context, and the LLM continues reasoning with updated information. This cycle may repeat several times within a single agentic task before a final output is produced.

Fig 2.4 — The Trilogy Interaction Cycle
PROMPT System prompt User message Few-shot examples = the program LLM Transformer model Probabilistic inference Token prediction = the compiler DATA Documents / RAG Databases API responses = the data layer instructs retrieves grounds response output Diagnostic Rule: When an agentic system fails, the fault is in exactly one of the three pillars. Wrong output? → Check: Prompt (ambiguous?) | LLM (wrong model?) | Data (missing/incorrect?)

Every agentic failure is traceable to one vertex of the trilogy. This is the diagnostic framework that replaces “the AI is broken.”


2.5   The Trilogy as a Diagnostic Framework

When an agentic system produces wrong output, there are exactly three places to look — and only three:

SymptomMost Likely SourceDiagnostic Action
Wrong output, wrong reasoningPrompt (language)Review system prompt for ambiguity or missing constraints
Confident but factually wrongData (file system)Verify retrieved data is accurate and complete
Worked yesterday, broken todayLLM (compiler)Check if model version changed; re-run evals
Inconsistent across runsLLM temperature or prompt ambiguityCheck temperature settings; look for underspecified instructions
Correct reasoning, wrong formatPrompt (output constraints)Add or tighten output format specification

2.6   Versioning the Trilogy

Stability in a production agentic system requires treating all three elements of the trilogy as versioned artifacts.

Version your prompts. Store system prompts in version control alongside application code. Never edit a production prompt without the same review process as a code change. Pin your model versions. Use the specific model version identifier. Test upgrades against your eval suite before rolling out to production. Version your data schemas. When the structure of retrieved data changes, re-validate all prompts that depend on that structure.


2.7   Chapter Summary

The Fundamental Trilogy is the first thing to understand and the last thing to lose sight of when building agentic systems: the Prompt is the programming language, the LLM is the compiler, and Data is the file system. All subsequent chapters describe techniques that are, at their core, ways of managing one or more elements of this trilogy with greater care and rigor.

Core Principle — Chapter 2

There is no fourth option. When an agentic system misbehaves, the fault is in the prompt, the data, or the model. When something is wrong, check the trilogy — in order. Start with the prompt. Almost always, the answer is there.