Lethal Bifecta
Definition
Coined by Andrew Bullen (Head of AI Security, Stripe) at the [[unprompted-conference-march-2026|[un]prompted Conference, March 2026]] as the write-side analogue of Simon Willison’s Lethal Trifecta. The trifecta describes the conditions under which an agent silently exfiltrates data; the bifecta describes the (simpler) conditions under which an agent takes a damaging action:
- Untrusted content — the agent ingests content the attacker can influence.
- Sensitive action — the agent has the capability to take a write/communication/destructive action with material consequence.
When both hold, an indirect prompt injection can steer the agent into a harmful action without the attacker needing to compromise any system other than the agent’s input surface.
Why “bifecta” and not part of the trifecta
The Lethal Trifecta has three legs because exfiltration requires distinguishing read (private data) from send (external comms) — the attacker has to pull data through both stages. A damaging write skips the read step: the attacker is not extracting your data, they are using your agent’s privileges to do something to the world. So the threat condenses to two ingredients.
This separation matters architecturally:
- Trifecta containment is mostly about removing the egress leg (Stripe’s Guardrail 1).
- Bifecta containment is mostly about gating the action leg with human review (Stripe’s Guardrail 2).
The two guardrails don’t overlap operationally — egress controls don’t catch a destructive write to your own production database, and action-review doesn’t catch a quiet POST to attacker.com.
”Sensitive” is load-bearing
Bullen (transcript): “Sensitive is very load-bearing here. Generally, the rule of thumb is that anything that is a production write or a broad communication or sending a message are the big things that we think of as sensitive actions.”
Implication: most agent flows have many tool calls and only some of them are sensitive. The architectural lift is classifying writes, not gating all writes — hence the [[breaking-the-lethal-trifecta-bullen-talk|ToolAnnotations schema]] (production_impacting_write, data_sensitivity, broadcasts_data_internally).
Containment patterns (from Stripe’s worked example)
- Annotate every tool / API endpoint with a sensitivity classification. The annotation is the policy.
- Force human review on tools/endpoints whose annotation crosses a threshold. The framework injects the review step automatically.
- Compensate for review fatigue — without compensating UX, the bifecta defense degrades to rubber-stamping. Patterns: queue + batch confirmations; optimistic writes with reverts; LLM-as-second-reviewer for fast obvious-bad-action triage.
- Cover the deep-agent case — where the agent writes its own code that bypasses declared tools, the annotation has to live on the API endpoint, not the tool. (This is unsolved in Stripe’s published architecture as of March 2026.)
Distinguishing it from adjacent concepts
- Lethal Bifecta vs Lethal Trifecta. Same family, different harm. Trifecta = silent exfil. Bifecta = damaging action. An agent can be vulnerable to one and not the other.
- Lethal Bifecta vs Least Agency Principle. Least agency is the broader governance principle (“strip every capability you can”); the bifecta is the specific structural test for the write side, parallel to the trifecta’s structural test for the read side.
- Lethal Bifecta vs Decision Rights for AI Agents. Decision rights are the governance documentation of which writes need approval; the bifecta is the threat-model justification for why those decision rights exist on the action axis specifically.
Relationship to OWASP frameworks
- LLM01 Prompt Injection — attack vector, shared with the trifecta.
- ASI04 Insecure Output Handling and ASI06 Excessive Agency — the agentic taxonomy’s labels for the bifecta’s outcome surface.
Provenance
Single-source-coined by Bullen at [un]prompted (March 4, 2026); slide title was “Bad Writes are even simpler…” with the diagram showing Untrusted Content + Sensitive Actions side-by-side. The “Lethal Bifecta” name appears in the transcript only — Bullen acknowledges “there isn’t a term officially for the things you need in order to have prompt injection deal damage by taking a sensitive action, but I guess, lacking something better, I will call this the lethal bifecta.”
Term provenance — single-source
“Lethal Bifecta” is currently a Bullen-only neologism. If it doesn’t catch on in the OWASP / NIST / Willison-aligned vocabulary by Q4 2026, downgrade this page to a redirect-style stub pointing at Lethal Trifecta §“write-side variant.” Track adoption.