Chapter 5

Semantic Routing

The New If/Then

Part III — The New Control Flow 8 sections

In traditional code, branching is deterministic and syntactically explicit — boolean conditions, typed inputs, exhaustively enumerated branches. This works correctly when every relevant condition is enumerable in advance. When inputs are natural language, that assumption breaks.

Consider: "My order hasn't shown up and I've already been waiting for two weeks — this is getting ridiculous." Is it a complaint? A refund request? An order status inquiry? An escalation? Probably all four, weighted differently. No boolean expression can evaluate this sentence reliably.

The Semantic Router replaces boolean evaluation with intent classification — using the LLM itself to determine which branch of execution a given input should activate.


5.1   The Limits of Enumerated Branching

The traditional if/then construct has one foundational assumption: the programmer can enumerate, in advance, all the conditions that matter. This assumption holds for well-defined state machines. It breaks down entirely for open-ended natural language input.

Human language has infinite surface variation over a finite number of intents. Keyword matching and enumerated conditions can cover the common cases, but the tail is long — and in production systems, the tail is where the most frustrated users live. "I want my money back," "Is there any way to undo my purchase?" and "I'm incredibly frustrated" all have no keyword match in a routing table, yet all require specific handling.


5.2   What is a Semantic Router?

A Semantic Router is a specialized component — typically implemented as a focused LLM call — whose sole job is to classify a user's input into a predefined intent category, and return a structured routing decision that the orchestration layer can act on.

Fig 5.1 — Semantic Router Architecture
User Input (free text) Semantic Router 1. Embed input 2. Compare to route templates 3. Select max cosine similarity Code Agent "write", "debug", "refactor" Research Agent "find", "explain", "list" Data Agent "chart", "analyze", "query" Fallback Handler below confidence threshold Specialist Agent Invoked Threshold matters Too low → wrong route; Too high → excess fallback

The semantic router converts a free-text input into a routing decision using vector embedding similarity — no keyword matching required. The confidence threshold is the key tuning parameter.

The router is deliberately narrow in scope. It does not try to answer the user's question. It does one thing: determine the category of intent and emit a structured output. This makes it testable, replaceable, and cheap to run — classification calls can use fast, inexpensive models rather than expensive reasoning models.


5.3   Implementing a Semantic Router

Classification Prompt
You are an intent classifier for a customer support system. Classify the user's message into exactly one of the following intent categories: - refund_request: The user wants a refund or return - order_status: The user wants to know where their order is - product_complaint: The user is reporting a problem with a product - escalation_request: The user is expressing serious frustration or demanding to speak to a manager - general_inquiry: Any other question or request Respond with a JSON object: { "intent": "<category>", "confidence": <0.0 to 1.0>, "reasoning": "<one sentence explaining your classification>" }

Few-Shot Examples: Improving Classification Precision

Few-shot examples dramatically improve consistency, especially at the edges of category boundaries. They communicate expected behavior through demonstration rather than abstract description:

Few-Shot Examples
Examples:
User: "I never received my package and it's been 3 weeks"
→ { "intent": "order_status", "confidence": 0.88 }

User: "I want to return this immediately and get a full refund"
→ { "intent": "refund_request", "confidence": 0.97 }

User: "I've called three times and nobody has helped me"
→ { "intent": "escalation_request", "confidence": 0.91 }

User: "Do you sell gift cards?"
→ { "intent": "general_inquiry", "confidence": 0.95 }

5.4   Semantic Router vs. Boolean Branching

DimensionTraditional If/ThenSemantic Router
Condition typeBoolean expressionIntent classification
Inputs handledEnumerated, exact-matchOpen-ended, natural language
Coverage of new casesRequires new else if branchOften generalizes to unseen inputs
Failure modeUnhandled exception / default fallbackMisclassification (wrong category)
TestingEnumerate all branch conditionsBuild a golden dataset of labeled inputs
ExplainabilityFully deterministic and auditableProbabilistic; requires confidence scores and reasoning traces
ExtensibilityAdd new branch in codeAdd new category to the prompt and golden dataset
Fig 5.2 — Classic Branching vs. Semantic Routing: Input Coverage
Classic if-else if "write" in input: elif "debug" in input: elif "analyze" in input: else: ??? (unhandled) ❌ "create" ❌ "make" ❌ "build" unmatched Semantic Router Code Zone write / create / build / make Research Zone find / explain… Data Zone chart / analyze… Overlap = ambiguous intent → LLM disambiguates ✓ "create", "build", "make" all hit Code Zone

Classical if-else chains fail on synonyms and paraphrases. Semantic routing covers intent zones — clusters of semantically similar expressions — so no “unexpected” input falls through to an unhandled else.


5.5   Nested Routing: The Semantic Switch/Case

Complex systems often require hierarchical routing: a primary classification determines the broad category, and a secondary classification refines the specific handling within that category. This is the semantic equivalent of a switch/case statement with nested cases.

Fig 5.3 — Nested Semantic Router: Two-Level Intent Tree
User Input L1 Router Domain classifier Code | Data | Infra | Other Code L2 Python | JS | SQL | Shell Data L2 Query | Visualize | Export Infra L2 Deploy | Monitor | Scale Python Agent JS Agent Query Agent Viz Agent Deploy Agent Monitor Agent Each L1 decision costs 1 classification call Each L2 decision costs 1 more → total overhead: ~2 calls

Two-level routing contains the classification problem at each level, keeping each router simple. Deeply nested trees (3+) levels add latency without proportional precision gains — prefer flat+wide over deep+narrow.

The design principle: each router in the hierarchy should be narrow in scope. A single router that classifies 50 distinct, specific intents is fragile. A two-level hierarchy with 5 primary intents and 5–10 secondary intents per category is more robust because each router has a limited, well-defined task.


5.6   Confidence Thresholds and Fallback Handling

A semantic router operating in production must handle uncertainty explicitly. An input with ambiguous intent should not be silently routed to a randomly chosen handler — it should trigger a fallback mechanism.

Route with Confidence Threshold
def route(user_input: str, threshold: float = 0.75) -> str:
    result = classifier_llm.call(
        system_prompt=CLASSIFICATION_PROMPT,
        user_message=user_input
    )
    
    if result["confidence"] >= threshold:
        return result["intent"]
    else:
        # Below threshold: ask for clarification
        return "clarification_required"

When confidence falls below the threshold, the appropriate response is to ask the user for clarification — not to guess:

I want to make sure I understand your request correctly. Are you looking to: (a) check the status of your order, or (b) request a refund?

5.7   Evaluating a Semantic Router

A semantic router is a production system component requiring the same evaluation rigor as any other component (Chapter 11). Because the output is categorical, evaluation is more straightforward than for generative agents — but the methodology is the same.

Golden Dataset: A labeled dataset of user inputs with their correct intent categories. Minimum 20–30 examples per category, with emphasis on edge cases. Metrics: Accuracy per category, confusion matrix (which categories are most commonly confused), and confidence calibration. Edge case coverage: Short ambiguous messages ("help"), highly emotional messages, multi-intent messages, and messages that test category boundaries.


5.8   Chapter Summary

The if/then statement branches on truth. The Semantic Router branches on intent. This distinction — from evaluating a condition to classifying a meaning — is the fundamental architectural move that makes conversational agents possible. The router does one job, intent categories should be mutually exclusive and collectively exhaustive, few-shot examples are essential, confidence thresholds prevent silent misrouting, and the router requires evals.

Core Principle — Chapter 5

The if/then statement asks "Is this true?" The semantic router asks "What does this mean?" One evaluates a condition. The other understands intent. This is the difference between a machine that reacts and an agent that reasons.