Three Retrieval Paths for Injection Payloads

Definition

A taxonomy from Securing Your Agents (slide 28) that classifies how a malicious instruction reaches the LLM context window during agent operation. Three paths exist; they have very different attacker effort profiles; and vector RAG attracts the most research attention while paths 2 and 3 carry most real-world risk.

The Three Paths

Path 1 — Vector-Embedded RAG (HARDEST for attackers)

Doc → Chunk → Embed → Vector DB → top-k retrieval → LLM

The payload must:

Survive chunking (it might land at a chunk boundary)
Survive embedding (the semantic signal must persist through dimensionality reduction)
Be retrieved at top-k for a query the victim user is likely to ask

Effort: HIGH. But not impossible — research shows instructions retain semantic fidelity through embedding, and 5 carefully crafted documents in a corpus of millions can achieve ~90% retrieval-and-execution success against typical similarity thresholds.

Path 2 — Full-Text / Direct Retrieval (BIGGEST PRACTICAL RISK)

Source → entire content into context → LLM

No chunking, no embedding. The full document hits the context window verbatim:

Web pages fetched by a browse(url) tool
Emails ingested by a mail-summary agent
PDFs and Google Docs passed to a “review this document” flow
MCP tool responses
File contents read by coding agents
Calendar invite bodies, meeting transcripts

Effort: LOW. The payload arrives intact with zero transformation. This is how EchoLeak and GeminiJack operated, and it is the dominant pattern in Johann Rehberger’s Month of AI Bugs disclosures.

Path 3 — Metadata and Hidden Fields (SNEAKIEST)

Hidden field → parsed by agent → LLM

The payload hides where humans don’t look but agents parse:

PDF metadata (Author, Title, Keywords, Subject)
HTML comments ()
Zero-width Unicode characters in otherwise-normal text
Image alt attributes
Right-to-left override (RTL) and other invisible Unicode controls
MCP tool description strings (the descriptions, not the responses)
File system extended attributes (xattrs)
Git commit message trailers

Effort: LOW — and the payload survives human review because the human never sees it. This is the path most likely to be missed by code review and red-team manual testing.

Why the Taxonomy Matters

Each path requires a different defense:

Path	Primary defense
1 — Vector RAG	Pre-ingest content scanning; per-source canary tokens; trust attribution at chunk level
2 — Full-text	Apply injection classifier to every retrieved document before assembling the prompt; trust-labeled boundary markers around retrieved content; never inline documents directly into the system prompt
3 — Metadata	Strip hidden fields at ingest (HTML comment removal, Unicode normalization, PDF-metadata stripping); only display what humans can see; render then re-extract for documents being summarized

A program that defends only path 1 (the academically interesting one) leaves the two paths attackers actually use wide open.

Mapping to Real-World Attack Surfaces

Surface	Dominant path
RAG over internal docs / knowledge base	Path 1 (also Path 2 if full-doc retrieval is used)
Mail-summary agent	Path 2 (email body) + Path 3 (HTML comments, hidden text)
Web-research agent	Path 2 (page body) + Path 3 (HTML comments, alt text)
Calendar / meeting agents	Path 2 (invite body) + Path 3 (metadata)
Coding agents	Path 2 (file content) + Path 3 (commit messages, README front matter)
MCP tool responses	Path 2 (response body) + Path 3 (tool description strings)
PDF document review	Path 2 (visible text) + Path 3 (PDF metadata, alt text, white-on-white text)

Defense Layering

The RAG hardening practice operationalizes the multi-path defense:

Ingest-time: strip metadata, normalize Unicode, apply injection classifier per source, attach trust-source attribution.
Assembly-time: wrap each source with explicit trust-label boundary markers (see System Prompt Architecture (Boundary Markers + Trust Labels)); insert canary tokens between sources.
Inference-time: monitor for goal-hijack indicators in chain-of-thought (LlamaFirewall AlignmentCheck) and for canary-token appearances in output.
Action-time: if an action was triggered by a path-2 or path-3 retrieval (rather than direct user input), require human confirmation for any high-risk tool call. See Least Agency Principle.

Enterprise Security in the Agentic AI Era

Explorer

Three Retrieval Paths for Injection Payloads

Three Retrieval Paths for Injection Payloads

Definition

The Three Paths

Path 1 — Vector-Embedded RAG (HARDEST for attackers)

Path 2 — Full-Text / Direct Retrieval (BIGGEST PRACTICAL RISK)

Path 3 — Metadata and Hidden Fields (SNEAKIEST)

Why the Taxonomy Matters

Mapping to Real-World Attack Surfaces

Defense Layering

See Also

Graph View

Table of Contents

Backlinks