Lethal Trifecta

Definition

Coined by Simon Willison in June 2025 (“The Lethal Trifecta for AI Agents”, simonwillison.net), the Lethal Trifecta is the minimum set of three agent capabilities that, when present together, allow an attacker to easily exfiltrate private data via prompt injection:

Access to private data — the agent can read sensitive content (mailbox, files, internal docs, RAG store, secrets in env).
Exposure to untrusted content — the agent ingests content from sources the attacker can influence (web pages, emails, documents, calendar invites, RAG entries, MCP tool descriptions).
Ability to externally communicate — the agent can send data outside the trust boundary (HTTP requests, outbound mail, file writes to shared storage, markdown image rendering, URL-fetch tool calls).

When all three hold, a successful prompt injection (almost always indirect) can chain the agent’s own capabilities into an exfiltration path. The attacker doesn’t need a software exploit; the agent supplies the read-and-write primitives.

Why It Is “Lethal”

The trifecta is necessary and (in practice) sufficient for end-to-end data exfiltration via natural-language attack:

Any two of the three capabilities is annoying-but-recoverable. Two agents, each with two-of-three, can each be useful.
All three in a single agent gives an attacker an exfiltration channel that requires no privilege escalation, no code execution exploit, and often no detectable malware artifact. The compromise looks identical to normal use.

The framing is load-bearing because it gives architects a structural test: ask, “does any single agent in our system hold all three?” If yes, that agent is one indirect injection away from being a data-exfil tool.

How the Trifecta Manifests in Real Attacks

Stage	Capability used	Example
Trigger	Untrusted content	Hidden HTML comment in a shared Google Doc the agent is asked to summarize
Read	Private data access	Agent has access to user’s mailbox or file system
Send	External communication	Agent renders a markdown image with `?data=…` query string, or POSTs to attacker URL

The Jules AI kill chain and most of Johann Rehberger’s “Month of AI Bugs” disclosures are concrete instantiations of the trifecta.

Containment Strategies

Break the trifecta. Remove at least one of the three capabilities from any agent that handles untrusted content. Practical patterns:

Split agents along the trifecta axes. A “research agent” has untrusted-content + external-comms but no private-data access. A “personal-assistant agent” has private-data + external-comms but no untrusted-content ingest. A “summarizer” has private-data + untrusted-content but no external-comms.
Remove external communication from sensitive agents. If an agent must touch private data and untrusted content, it cannot also speak to the network. All output must go through a human-reviewed surface.
Treat retrieved content as data, not instructions. Use system prompt architecture with explicit trust labels. This does not break the trifecta on its own (a determined injection can still succeed) but reduces the success rate.
Egress filtering. Domain allowlists at the network layer make the “external communication” leg detectable and constrainable.
Capability-level audit. Every agent definition should declare which legs of the trifecta it holds. The audit asks: is this combination justified?
Capability-based authorization at the action layer. Even when an agent must hold all three legs, the action it can take with them can be deterministically constrained per task. Capability-based authorization (e.g. Tenuo Warrants from [[capability-based-authorization-niyikiza-talk|Niyikiza, [un]prompted March 2026]]) issues task-scoped, holder-bound, delegation-aware capabilities; sub-agent capabilities can only narrow (monotonic attenuation). This contains an exfil-oriented prompt injection at execution time without removing any leg of the trifecta from the agent’s role — the agent is allowed to ingest untrusted content, hold private-data access, and reach the network, but only the specific action set the warrant permits is allowed. Reports 90%→0% multi-agent ASR on Tenuo’s custom harness.
Layered structural defenses (“Architecting the Fortress”). Nicolas Lidzborski’s [[securing-workspace-genai-at-google-lidzborski-talk|Google Workspace talk at [un]prompted March 2026]] presents a four-layer blueprint: (1) low-risk input — strip hidden content + abuse-signal-aware ingestion + data-provenance tracking; (2) prompt delimitation via sentinel tokens + adversarial fine-tuning; (3) deterministic orchestration with state-aware FSM that constrains downstream capabilities by data origin; (4) output sanitization including markdown scrubbing, dynamic URL classification, and removal of ungrounded LLM-hallucinated URLs. Combined with Plan-Validate-Execute for high-stakes irreversible actions. Worked example: the Nassi et al. “Invitation Is All You Need” attack (calendar invite as zero-click hijack vector for Gemini) extended in Lidzborski’s deployment to smart-home control (lights, curtains, heater) — a real-world demonstration of trifecta exploitation when the action surface is broader than recognized.

This is the architectural premise of Stripe’s containment architecture, presented by Andrew Bullen at [[unprompted-conference-march-2026|[un]prompted, March 2026]] — see Breaking the Lethal Trifecta (Without Ruining Your Agents) for the full worked example. Stripe’s argument: among the three legs, only egress is feasible to remove in a real enterprise (private data is needed by most agents; untrusted content is structurally hard to filter without losing utility). The same talk introduces the Lethal Bifecta as a write-side analogue. Niyikiza’s capability-warrants approach is complementary: when egress can’t be fully removed, action-layer capability attenuation contains the blast radius even when the agent reaches it.

Relationship to OWASP Frameworks

LLM01 Prompt Injection is the attack vector; the Lethal Trifecta is the structural condition that makes it lethal.
ASI01 Agent Goal Hijack and ASI05 Sensitive Data Disclosure are the OWASP labels for the outcome when the trifecta is exploited.
Least Agency Principle can be read as the Lethal Trifecta’s positive form: strip every leg you can.

Distinguishing It From Adjacent Concepts

Lethal Trifecta ≠ “agent can do bad things.” The trifecta is specifically the configuration that makes silent exfiltration trivial. Other harms (cost explosion, data destruction, hallucination cascades) are real but mechanistically different.
Lethal Trifecta is a Confidentiality + Integrity threat model. The trifecta describes silent exfiltration (C) and the Bifecta describes unintended action (I via the write-side). Availability harms — runaway agents, recursive loops, resource exhaustion — sit outside the trifecta entirely and have their own threat surface; see Agent Availability Threats. The MAAIS CIAA augmentation makes the case for treating Availability and Accountability as co-equal axes alongside C + I.
Lethal Trifecta is structural, not behavioral. A trifecta agent is a problem before any user interaction occurs. Defense begins at design time.
The trifecta does not assume the model is misaligned or compromised. A perfectly aligned model with all three capabilities is still exposed because the attacker is the source of misalignment, via injected content.

On "unconditionally vulnerable"

A serious skeptic will push back on the unconditional framing. Stripe (Bullen, March 2026) runs trifecta agents in production with platform-level egress containment + sensitive-action HITL and reports 1.5–6.7% attack success rates across model generations. CaMeL (Google DeepMind) and deterministic-gating research demonstrate further reductions without splitting the trifecta. The honest framing: the trifecta is necessary for natural-language exfil at scale and sufficient given current defense maturity to require platform-layer containment. In production, containment drives ASR very low but not zero. Bullen’s “even 0.1% is too high” is the operative bar — the threshold, not the unconditional nature, is what makes the trifecta a design-time test. See Wiki Novelty and Counter-Arguments §Thesis 3.

Enterprise Security in the Agentic AI Era

Explorer

Lethal Trifecta

Lethal Trifecta

Definition

Why It Is “Lethal”

How the Trifecta Manifests in Real Attacks

Containment Strategies

Relationship to OWASP Frameworks

Distinguishing It From Adjacent Concepts

See Also

Graph View

Table of Contents

Backlinks