Stripe

Payments platform. In the AI-security space, Stripe has emerged as one of the most-cited practitioner sources for production agent containment architecture — they ship customer-facing AI agents at scale and have an internal AI Security team led by Andrew Bullen.

Why Stripe shows up in this wiki

Two talks at [[unprompted-conference-march-2026|[un]prompted Conference, March 2026]]:

TalkSpeaker(s)Status in this wiki
Breaking the Lethal Trifecta (Without Ruining Your Agents)Andrew BullenFull summary ingested (slides + transcript). The canonical practitioner worked example for trifecta containment + bifecta coining.
Guardrails Beyond Vibes: Shipping Security Agents in ProductionJeffrey Zhang + Siddh ShahFull summary ingested (slides + transcript). Two production security agents: modular multi-agent sequential pipeline for threat modeling; single focused agent with minimal toolset for security routing. Evaluation via golden-standard test cases + LLM-as-a-Judge semantic scoring. Five concrete learnings including AlphaEvolve failure.

Stripe’s published agent containment stack

The components, in the order they appear in Bullen’s talk:

ComponentRoleThis wiki
SmokescreenOpen-source egress proxy (pre-dates AI agents); choke point for the External-Communication leg of the trifecta.Product page
Agent-tagging conventionAny service that talks to Stripe’s foundation-model proxy is tagged-as-agent.Pattern documented in Breaking the Lethal Trifecta (Without Ruining Your Agents).
CI-time egress checkTagged services can’t change egress configuration without escalated review.Same as above.
ToolshedInternal central MCP proxy + tool registry; PEP for tool-call policy.Product page
ToolAnnotations schemaDeclarative per-tool annotations (production_impacting_write, data_sensitivity, broadcasts_data_internally); PDP for human-review gating.Documented in talk page
Safe Search via OpenAIInternet-data without true egress (sets external_web_access: false); honest caveat — shifts trust to OpenAI.In talk page
Queued / batched / optimistic confirmationsUX patterns to keep agents moving while preserving human-in-the-loop on sensitive actions.In talk page

Notable real-world incident referenced

Direct hit

Bullen cites the 2025-07-16 disclosure “Claude Jailbroken to Mint Unlimited Stripe Coupons” in his threat-baseline slide — a prompt-injection-driven jailbreak with direct financial impact on Stripe’s surface area. This is a useful data point: Stripe is publishing its containment architecture in the wake of a real consumer-side incident hitting their own brand, not as an a-priori thought experiment.

A separate incidents-page write-up of the Claude→Stripe-coupon jailbreak would be a useful follow-up — the talk references the headline but does not give the full kill-chain.

People