Prompt, LLM, and Data
Every computing system, at its foundation, rests on three pillars: a language to express instructions, an engine to execute them, and data to operate on. These three elements are not optional or separable. Remove any one of them and you do not have a computing system — you have a component waiting to be assembled.
Traditional software engineering has its own trilogy: Source code (the language), the compiler or runtime (the execution engine), and the file system and databases (the data layer). Agentic programming has an equivalent trilogy that maps to these three pillars with near-perfect structural correspondence. Understanding this mapping is the diagnostic framework you will return to every time something goes wrong.
The structure is identical. The implementation is entirely different. Every agentic failure traces to exactly one vertex of the right triangle.
A programming language is a formal system for expressing instructions to a machine. The prompt is a programming language by this definition — with one critical difference. Its "grammar" is not a formal specification; it is the statistical distribution of human language that the model was trained on. Its "compiler" does not check syntax; it infers semantics. And the mapping from prompt to behavior is probabilistic rather than deterministic.
This does not make the prompt a lesser language. It makes it a fundamentally different one.
A well-formed prompt has identifiable structural components analogous to conventional source code:
System Identity — Who is this agent? Analogous to a class definition. Context Injection — What does the agent need to know right now? Analogous to method arguments. The Task — What should the agent do? The imperative statement, equivalent to the function body. Output Constraints — What format must the output satisfy? The return type declaration.
In a traditional language, a syntax error is immediately visible. In prompt programming, there are no syntax errors — the LLM will always produce some output, regardless of how poorly the prompt is written. The "syntax error" of the prompt is ambiguity, and it is silent.
A compiler's job is translation: it takes source code written in a human-readable language and produces executable instructions. The LLM performs the same function. It takes the prompt — the source code — and produces output: text, structured data, tool calls, or chains of reasoning.
The LLM is not buggy when outputs vary — it is probabilistic by design. Engineering for this means specifying intent with enough precision that the distribution of outputs is acceptable, not just the mean.
Software engineers pin the compiler version in production and test upgrades explicitly. The same principle applies to LLMs. When a model is updated, the "compiler" has changed. A prompt that produced reliable JSON output under one model version may produce inconsistent formatting under the next.
Practical implication: In production, pin the model version identifier (e.g., gpt-4o-2024-08-06, not gpt-4o). Test new model versions against your eval suite before upgrading. Treat a model upgrade with the same rigor as a dependency upgrade.
Traditional programs read from files, query databases, receive network responses. In agentic systems, data plays the same role — but the mechanism of access has changed entirely.
Eager Injection (Context Stuffing) — Include all relevant data directly in the prompt. Analogous to passing all function arguments by value. Fast and simple, but consumes context window tokens and doesn't scale to large data volumes.
Lazy Retrieval (RAG) — Store data in a vector database and retrieve only the semantically relevant chunks at query time. Analogous to a database query: the function knows how to fetch what it needs at runtime.
RAG (Retrieval-Augmented Generation): the agent does not “know” the documents — it retrieves the most relevant chunks at inference time and reasons over them as injected context.
Tool-Called Retrieval — The agent itself decides, mid-reasoning, that it needs specific data and calls a tool to retrieve it. Analogous to a function that issues a database query at runtime based on its input.
tools = [{
"name": "get_order_status",
"description": "Retrieves the current status of an order by order ID",
"parameters": {
"type": "object",
"properties": {
"order_id": {"type": "string"}
}
}
}]
Data baked into a prompt at training time is frozen. Data injected at runtime is as fresh as the last database write. Data retrieved via live tool calls is as fresh as the underlying system. Choosing the right data access pattern is not just a performance decision — it is a correctness decision.
The three elements form a cycle that runs for every agent invocation. The prompt tells the LLM what to do. The LLM determines what data it needs and calls tools to retrieve it. The retrieved data is added to the context, and the LLM continues reasoning with updated information. This cycle may repeat several times within a single agentic task before a final output is produced.
Every agentic failure is traceable to one vertex of the trilogy. This is the diagnostic framework that replaces “the AI is broken.”
When an agentic system produces wrong output, there are exactly three places to look — and only three:
| Symptom | Most Likely Source | Diagnostic Action |
|---|---|---|
| Wrong output, wrong reasoning | Prompt (language) | Review system prompt for ambiguity or missing constraints |
| Confident but factually wrong | Data (file system) | Verify retrieved data is accurate and complete |
| Worked yesterday, broken today | LLM (compiler) | Check if model version changed; re-run evals |
| Inconsistent across runs | LLM temperature or prompt ambiguity | Check temperature settings; look for underspecified instructions |
| Correct reasoning, wrong format | Prompt (output constraints) | Add or tighten output format specification |
Stability in a production agentic system requires treating all three elements of the trilogy as versioned artifacts.
Version your prompts. Store system prompts in version control alongside application code. Never edit a production prompt without the same review process as a code change. Pin your model versions. Use the specific model version identifier. Test upgrades against your eval suite before rolling out to production. Version your data schemas. When the structure of retrieved data changes, re-validate all prompts that depend on that structure.
The Fundamental Trilogy is the first thing to understand and the last thing to lose sight of when building agentic systems: the Prompt is the programming language, the LLM is the compiler, and Data is the file system. All subsequent chapters describe techniques that are, at their core, ways of managing one or more elements of this trilogy with greater care and rigor.
There is no fourth option. When an agentic system misbehaves, the fault is in the prompt, the data, or the model. When something is wrong, check the trilogy — in order. Start with the prompt. Almost always, the answer is there.