Lethal Bifecta

Definition

Coined by Andrew Bullen (Head of AI Security, Stripe) at the Unprompted Conference, March 2026 as the write-side analogue of Simon Willison’s Lethal Trifecta. The trifecta describes the conditions under which an agent silently exfiltrates data; the bifecta describes the (simpler) conditions under which an agent takes a damaging action:

Untrusted content: the agent ingests content the attacker can influence.
Sensitive action: the agent has the capability to take a write/communication/destructive action with material consequence.

When both hold, an indirect prompt injection can steer the agent into a harmful action without the attacker needing to compromise any system other than the agent’s input surface.

Why “bifecta” and not part of the trifecta

The Lethal Trifecta has three legs because exfiltration requires distinguishing read (private data) from send (external comms): the attacker has to pull data through both stages. A damaging write skips the read step: the attacker is not extracting your data, they are using your agent’s privileges to do something to the world. So the threat condenses to two ingredients.

This separation matters architecturally:

Trifecta containment is mostly about removing the egress leg (Stripe’s Guardrail 1).
Bifecta containment is mostly about gating the action leg with human review (Stripe’s Guardrail 2).

The two guardrails don’t overlap operationally: egress controls don’t catch a destructive write to your own production database, and action-review doesn’t catch a quiet POST to attacker.com.

”Sensitive” is load-bearing

Bullen (transcript): “Sensitive is very load-bearing here. Generally, the rule of thumb is that anything that is a production write or a broad communication or sending a message are the big things that we think of as sensitive actions.”

Implication: most agent flows have many tool calls and only some of them are sensitive. The architectural lift is classifying writes, not gating all writes; hence the `ToolAnnotations` schema (production_impacting_write, data_sensitivity, broadcasts_data_internally).

Containment patterns (from Stripe’s worked example)

Annotate every tool / API endpoint with a sensitivity classification. The annotation is the policy.
Force human review on tools/endpoints whose annotation crosses a threshold. The framework injects the review step automatically.
Compensate for review fatigue. Without compensating UX, the bifecta defense degrades to rubber-stamping. Patterns: queue + batch confirmations; optimistic writes with reverts; LLM-as-second-reviewer for fast obvious-bad-action triage.
Cover the deep-agent case. Where the agent writes its own code that bypasses declared tools, the annotation has to live on the API endpoint, not the tool. (This is unsolved in Stripe’s published architecture as of March 2026.)

Distinguishing it from adjacent concepts

Lethal Bifecta vs Lethal Trifecta. Same family, different harm. Trifecta = silent exfil. Bifecta = damaging action. An agent can be vulnerable to one and not the other.
Lethal Bifecta vs Least Agency Principle. Least agency is the broader governance principle (“strip every capability you can”); the bifecta is the specific structural test for the write side, parallel to the trifecta’s structural test for the read side.
Lethal Bifecta vs Decision Rights for AI Agents. Decision rights are the governance documentation of which writes need approval; the bifecta is the threat-model justification for why those decision rights exist on the action axis specifically.

Relationship to OWASP frameworks

LLM01:2025 Prompt Injection: attack vector, shared with the trifecta. Code verified against the 2025 source by the LLM Top 10 review.
ASI02 Tool Misuse and Exploitation: the agentic taxonomy’s label for the bifecta’s outcome — a legitimate tool or action weaponized into a damaging write.
Threat Modeling for AI: the spine applies the bifecta as the write-side structural test alongside the trifecta; the reconciliation matrix records both as design-time go/no-go checks.

Provenance

Single-source-coined by Bullen at Unprompted (March 4, 2026); slide title was “Bad Writes are even simpler…” with the diagram showing Untrusted Content + Sensitive Actions side-by-side. The “Lethal Bifecta” name appears in the transcript only — Bullen acknowledges “there isn’t a term officially for the things you need in order to have prompt injection deal damage by taking a sensitive action, but I guess, lacking something better, I will call this the lethal bifecta.”

Term provenance — single-source

“Lethal Bifecta” is currently a Bullen-only neologism. If it doesn’t catch on in the OWASP / NIST / Willison-aligned vocabulary by Q4 2026, downgrade this page to a redirect-style stub pointing at Lethal Trifecta §“write-side variant.” Track adoption.

Enterprise Security in the Agentic AI Era

Explorer

Lethal Bifecta

Lethal Bifecta

Definition

Why “bifecta” and not part of the trifecta

”Sensitive” is load-bearing

Containment patterns (from Stripe’s worked example)

Distinguishing it from adjacent concepts

Relationship to OWASP frameworks

Provenance

Graph View

Table of Contents

Backlinks