Three Retrieval Paths for Injection Payloads

Definition

A taxonomy from Securing Your Agents (slide 28) that classifies how a malicious instruction reaches the LLM context window during agent operation. Three paths exist; they have very different attacker effort profiles; and vector RAG attracts the most research attention while paths 2 and 3 carry most real-world risk.

The Three Paths

Path 1 — Vector-Embedded RAG (HARDEST for attackers)

Doc → Chunk → Embed → Vector DB → top-k retrieval → LLM

The payload must:

  1. Survive chunking (it might land at a chunk boundary)
  2. Survive embedding (the semantic signal must persist through dimensionality reduction)
  3. Be retrieved at top-k for a query the victim user is likely to ask

Effort: HIGH. But not impossible — research shows instructions retain semantic fidelity through embedding, and 5 carefully crafted documents in a corpus of millions can achieve ~90% retrieval-and-execution success against typical similarity thresholds.

Path 2 — Full-Text / Direct Retrieval (BIGGEST PRACTICAL RISK)

Source → entire content into context → LLM

No chunking, no embedding. The full document hits the context window verbatim:

  • Web pages fetched by a browse(url) tool
  • Emails ingested by a mail-summary agent
  • PDFs and Google Docs passed to a “review this document” flow
  • MCP tool responses
  • File contents read by coding agents
  • Calendar invite bodies, meeting transcripts

Effort: LOW. The payload arrives intact with zero transformation. This is how EchoLeak and GeminiJack operated, and it is the dominant pattern in Johann Rehberger’s Month of AI Bugs disclosures.

Path 3 — Metadata and Hidden Fields (SNEAKIEST)

Hidden field → parsed by agent → LLM

The payload hides where humans don’t look but agents parse:

  • PDF metadata (Author, Title, Keywords, Subject)
  • HTML comments (<!-- … -->)
  • Zero-width Unicode characters in otherwise-normal text
  • Image alt attributes
  • Right-to-left override (RTL) and other invisible Unicode controls
  • MCP tool description strings (the descriptions, not the responses)
  • File system extended attributes (xattrs)
  • Git commit message trailers

Effort: LOW — and the payload survives human review because the human never sees it. This is the path most likely to be missed by code review and red-team manual testing.

Why the Taxonomy Matters

Each path requires a different defense:

PathPrimary defense
1 — Vector RAGPre-ingest content scanning; per-source canary tokens; trust attribution at chunk level
2 — Full-textApply injection classifier to every retrieved document before assembling the prompt; trust-labeled boundary markers around retrieved content; never inline documents directly into the system prompt
3 — MetadataStrip hidden fields at ingest (HTML comment removal, Unicode normalization, PDF-metadata stripping); only display what humans can see; render then re-extract for documents being summarized

A program that defends only path 1 (the academically interesting one) leaves the two paths attackers actually use wide open.

Mapping to Real-World Attack Surfaces

SurfaceDominant path
RAG over internal docs / knowledge basePath 1 (also Path 2 if full-doc retrieval is used)
Mail-summary agentPath 2 (email body) + Path 3 (HTML comments, hidden text)
Web-research agentPath 2 (page body) + Path 3 (HTML comments, alt text)
Calendar / meeting agentsPath 2 (invite body) + Path 3 (metadata)
Coding agentsPath 2 (file content) + Path 3 (commit messages, README front matter)
MCP tool responsesPath 2 (response body) + Path 3 (tool description strings)
PDF document reviewPath 2 (visible text) + Path 3 (PDF metadata, alt text, white-on-white text)

Defense Layering

The RAG hardening practice operationalizes the multi-path defense:

  1. Ingest-time: strip metadata, normalize Unicode, apply injection classifier per source, attach trust-source attribution.
  2. Assembly-time: wrap each source with explicit trust-label boundary markers (see System Prompt Architecture (Boundary Markers + Trust Labels)); insert canary tokens between sources.
  3. Inference-time: monitor for goal-hijack indicators in chain-of-thought (LlamaFirewall AlignmentCheck) and for canary-token appearances in output.
  4. Action-time: if an action was triggered by a path-2 or path-3 retrieval (rather than direct user input), require human confirmation for any high-risk tool call. See Least Agency Principle.

See Also