Enterprise Security in the Agentic AI Era
A research wiki on the bidirectional intersection of agentic AI and enterprise cybersecurity — frameworks, reference architectures, emerging practices, and maturity modeling. Maintained by Anton Goncharov; ingested from primary sources (papers, talks, vendor research, incident reports) and synthesized into cross-linked pages.
Scope (three axes, six scope-axis values)
The wiki covers three bidirectional axes between agentic AI and cybersecurity. Every page declares its scope_axis: frontmatter; the closed vocabulary is documented in conventions §Scope Axes.
- Security OF AI — frameworks, reference architectures, and maturity models for safely deploying AI agents in production. Anchored by the Agentic AI Security CMM (5 × 9, cumulative, ID-tagged evidence) and the six-plane Agentic AI Security Reference Architecture. Scope axes:
sec-of-ai. - AI IN security — agentic systems used by defenders (SOC automation, autonomous triage) and offensive operators (AI-assisted exploitation, autonomous pentest), plus frontier-model-driven vulnerability discovery. Scope axes:
ai-in-sec-defense,ai-in-sec-offense,redteam-for-ai,ai-vuln-discovery. - Security AGAINST AI-driven attacks — how SDLC, supply chain, identity, and operational security must evolve when adversaries have frontier AI capability. Scope axes:
sec-against-ai.
Start Here
- State of the Field — the prose tour of where the wiki stands across all three axes.
- Agentic AI Security Reference Architecture (2026) — the six-plane RA. Read this for the structural model on the
sec-of-aiaxis. - Agentic AI Security CMM 2026 — the 5 × 9 maturity model. Read this to assess or plan a security-of-AI program.
- Per-axis synthesis pages:
- Agentic SOC: State of the Field —
ai-in-sec-defense - Offensive AI: State of the Field —
ai-in-sec-offense - Red Teaming for AI: Synthesis —
redteam-for-ai - Frontier AI for Vulnerability Discovery —
ai-vuln-discovery - SDLC in the AI-Attacker Era —
sec-against-ai - Secure-SDLC Framework Stack for 2026 — opinionated secure-SDLC framework-stack recommendation
- Agentic SOC: State of the Field —
- CMM Standards Crosswalk · CMM Measurement Protocol · CMM Dependency Rules — companion pages: how the CMM maps to NIST/ISO/MITRE/OWASP, how to actually score an org, and the cross-domain dependency caps.
- Hot Cache · Log — what’s been added recently and why.
What’s Load-Bearing
Two anchor artifacts and six scope axes. Everything else — concepts, practices, papers, entities, incidents, comparisons — is cross-linked back to those:
- Reference Architecture — 6-plane RA (Identity / Control / Runtime / Egress / Data / Observability).
- Capability Maturity Model — 5 × 9 cumulative CMM (D1 Governance / D2 IAM / D3 Supply Chain / D4 Guardrails / D5 Secure Architecture / D6 Data, Memory & RAG / D7 Observability / D8 Operational Resilience / D9 Continuous Compliance).
- Adjacent maturity frameworks — PwC Stage-Coverage Tiers for GenAI-in-SDLC adoption breadth (orthogonal axis); Cybersecurity CMM Exemplars for the BSIMM / SAMM / CMMI / CMMC / NIST CSF 2.0 design lessons that shaped the wiki’s CMM.
Overview
- Overview — state of the field synthesis (seed)
Frameworks
See Frameworks Index.
- CLASP — capability-centric evaluation rubric for autonomous security agents
- Red Teaming Capability Framework — five-tier layered model for first-party agentic AI red teaming
- NIST AI RMF — risk-management framework, four-function model (Govern, Map, Measure, Manage)
- NIST AI 600-1 — Generative AI Profile addendum to NIST AI RMF
- NIST SP 800-218A — SSDF AI Profile (stub)
- MITRE ATLAS — adversarial-ML threat taxonomy; jumped 66→84 techniques in Q1 2026
- OWASP LLM Top 10 — model-layer risks; prompt injection #1
- OWASP Agentic AI Top 10 (ASI) — agent-orchestration-layer risk taxonomy
- OWASP AIVSS — vulnerability scoring system extending CVSS 4.0 with agentic amplification
- IEC 42001 — first certifiable AI management system standard
- Google SAIF — secure-by-design lifecycle framework, donated to CoSAI
- CoSAI Principles — Coalition for Secure AI principles and outputs
- Microsoft RAI — responsible-AI standard with operational tooling
- CSA MAESTRO — 7-layer threat model for cross-layer agentic AI analysis
- EU AI Act — risk-tiered EU regulation, enforcement Aug 2026
- A2A Protocol — Agent-to-Agent v1.0.0 (Mar 2026); Linux Foundation-governed since June 2025; transport security + Agent Card signing in spec; message integrity + replay + cross-agent delegation are vendor / proposal-side
- Cyber Defense Matrix — Sounil Yu’s 5×5 NIST-CSF-vs-asset-class matrix (stub)
- Gartner AI TRiSM — analyst-defined market category; expanded with Guardian Agents in Feb 2026
- AIUC-1 — first independent security/safety/reliability certification standard for AI agents; six pillars; quarterly updates; Schellman first accredited auditor (Feb 2026); UiPath first enterprise-automation cert
- OpenTelemetry gen_ai.* Semantic Conventions — CNCF-graduated standard (v1.37+, experimental status); vendor-neutral agent observability foundation; gen_ai.* spans cover model calls, tool calls, RAG retrievals, agent steps; no license cost; mandatory baseline for CMM D7 L3; SIG contributors include Amazon, Elastic, Google, IBM, Microsoft
- XACML — eXtensible Access Control Markup Language — OASIS standard (3.0, 2013); historical lineage of the four-role architecture (PEP / PDP / PIP / PAP); language layer is dormant (superseded by Cedar / OPA); architectural-role layer is alive (anchored in NIST SP 800-162 + NIST SP 800-207)
- NIST SP 800-162 — Guide to Attribute Based Access Control (ABAC) — NIST publication (2014, reaffirmed 2019); the wiki’s preferred living-standard citation for the four-role vocabulary (PEP / PDP / PIP / PAP). Generalizes XACML’s role split into a policy-language-agnostic NIST reference. Use for any role-vocabulary or role-architecture absence claim per the Standards Validation Methodology
- AWS Agentic AI Security Scoping Matrix — AWS Security blog (Nov 2025); four-scope agency/autonomy ladder (No / Prescribed / Supervised / Full agency) plus six security dimensions; the wiki’s anchor citation for the agency-vs-autonomy terminology distinction. Crosswalks to wiki CMM L1–L5+, CSA ATF Intern → Principal, and OWASP four-tier least-agency
- MAAIS — Multilayer Agentic AI Security Framework — Arora & Hastings, arXiv preprint (Dec 2025); seven-layer defense-in-depth framework + the CIAA augmentation (CIA + Accountability) for agentic AI; tactic-level MITRE ATLAS validation; the wiki’s anchor citation for CIAA as the augmented security triad for autonomous systems
- Microsoft ZT4AI — Zero Trust for AI — Microsoft’s adaptation of Zero Trust principles to AI systems and agentic deployments; ~700-control control set; March 2026 reference-architecture / workshop / assessment-tool refresh; pillars include Identity / Data / Devices / Apps & Workloads / Network / Visibility-Automation-Orchestration / Governance; vendor-locked to the Microsoft Security stack but principles transfer
- Microsoft Secure Development Lifecycle (SDL) — Microsoft’s foundational secure-by-design SDLC framework (2004); influences NIST SSDF lineage; 2026-02-03 AI extension announced six SDL-for-AI focus areas (threat modeling for AI / AI observability / AI memory protections / agent identity & RBAC / AI model publishing / AI shutdown mechanisms) and six operating pillars (research / policy / standards / enablement / cross-functional collaboration / continuous improvement); first major-vendor secure-SDLC framework with explicit AI extension scope; framed as “a way of working, not a checklist”
- NIST SSDF — Secure Software Development Framework (SP 800-218 v1.1) — NIST’s outcomes-oriented secure-SDLC framework (Feb 2022; Souppaya/Scarfone/Dodson); four practice groups (PO/PS/PW/RV), 19 practices, ~50 tasks; explicitly cites Microsoft SDL (
MSSDL) as a named source for 8+ tasks; federal regulatory anchor under EO 14028 and OMB M-22-18; mandatory self-attestation for vendors selling to US federal government; vendor-neutral common-vocabulary instrument with cross-walks into BSAFSS / BSIMM / IEC 62443 / ISO 27034 / OWASP ASVS/MASVS/SAMM / NIST CSF / SP 800-53/160/161/181 - NIST SP 800-218A — SSDF Community Profile for Generative AI and Dual-Use Foundation Models — NIST’s AI-specific extension of SSDF (July 2024; Booth/Souppaya/Vassilev/Ogata/Stanley(CISA)/Scarfone); authorized by EO 14110 § 4.1.a; adds 1 new practice (PW.3 Confirm Integrity of Training Data) with 3 sub-tasks, 3 net-new tasks (PO.5.3 continuous monitoring, PS.1.2 protect training data, PS.1.3 protect model weights), and AI-specific R/C/N additions with High/Medium/Low priority across ~30 existing tasks; informative references into NIST AI RMF, OWASP LLM Top 10, and NIST AI 100-2e2023 Adversarial ML Taxonomy; federal-anchor citation for CMM D4/D5/D6/D7/D9
- OSFI Guideline B-13 — Technology and Cyber Risk Management — Canada’s federal regulatory expectations document for technology and cyber risk at federally-regulated financial institutions (effective 2022-07-31); three domains (Governance and Risk Management; Technology Operations and Resilience; Cyber Security); 17 high-level expectations; §2.4 System Development Life Cycle is the direct regulatory hook for Canadian-bank secure-SDLC; §3.1.2 explicitly names penetration testing and red teaming; §3.4 covers respond/recover/learn including forensic investigation expectations
- OSFI Guideline E-23 (2027) — Model Risk Management — Canada’s federal regulatory framework for enterprise-wide model risk management (published 2025-09-11; effective 2027-05-01); explicit AI/ML scope including “black box” approaches, autonomous decision-making, and re-parametrization / drift detection; three principle sections (B Enterprise-wide MRM / C Risk-Based Approach / D Lifecycle Management) plus Appendix A inventory schema; six-stage model lifecycle; explicit cross-reference to OSFI B-13 at the Deployment stage; the Canadian peer to US SR 11-7
Reference Architectures
See Architectures Index.
- AI Agent Identity Architecture — delegated vs. autonomous models; SPIFFE/SPIRE, NHI, Credential Zero, action-to-identity tracing
- Agentic AI Security Reference Architecture (2026) — six-plane practical architecture (Identity / Control / Runtime / Egress / Data / Observability) with deployment-shape mappings and OWASP ASI threat-control matrix
- System Prompt Architecture — boundary markers + trust labels for distinguishing system / user / retrieved / tool zones
Practices
See Practices Index.
- Agent Observability — glass-box paradigm: hooks, OTel, identity multiplexing, Cedar policies, agent behavioral monitoring (insider-threat framing)
- Agent Sandboxing — OS-level isolation as last-line-of-defense against goal manipulation and command injection
- NHI Governance for AI Agents — credential lifecycle management for autonomous-agent identity sprawl
- Credential Proxy Pattern — proxy-token / vault-injection to keep credentials out of agent context
- Prompt Injection Containment — two-layer detection + platform-level tool-call interception
- Supply Chain Security for Agents — skill marketplace controls, cognitive file integrity, Brain Git rollback
- AI-BOM — AI Bill of Materials; runtime discovery, behavioral baselines
- Securing AI Talking Points — four-point briefing outline
- Guardian Agent Metagovernance — “Guards for the Guardians” — five controls governing guardian agents themselves
- RAG Hardening — per-source boundary markers, injection scanning, source-trust attribution, action-source coupling
- AI Security Posture Management (AI-SPM) — continuous AI-asset inventory + misconfiguration detection (models, prompts, indexes, connectors)
- Data Security Posture Management (DSPM) for AI — sensitive-data mapping that feeds AI guardrails so risky sources are excluded at query time
- Oversharing Controls for AI Search — knowledge-layer need-to-know boundaries for Microsoft Copilot, Glean, Gemini
- Agent Token Chargeback — variable chargeback infrastructure for agentic-AI token spend; FinOps-for-agents primitive of the AI Agent Layered Council
- Distributed Kill Switch — one-vote-veto pattern; halt-authority distributed to every team member in the loop; organizational counterpart to least-agency block tier
- Multi-Agent Runtime Security — depth-companion to single-agent observability; cascade detection (ASI08), pairwise/aggregate behavioral baselines, inter-agent IR doctrine; honest about 2026 academic-prototype era
- Plan-Validate-Execute Pattern — Google Workspace’s canonical HITL pattern for high-stakes irreversible actions; structured plan → deterministic gatekeeper → execute; addresses recursive-injection failure of LLM-based reviewers
- Anti-Patterns and Failure Modes — 25 catalogued ways the RA + CMM go wrong in operation across 9 categories (architecture, CMM scoring, operations, threat-model, standards, identity, multi-agent, procurement, talent); the wiki’s BSIMM-activities-not-undertaken / CMMC-appeals / SAMM-scoring-caveats analog
Maturity Models
Canonical CMM (use this for new work):
- Agentic AI Security CMM 2026 — practical 5-level × 9-domain cumulative CMM with ID-tagged evidence; companion to the agentic-ai-security-reference-architecture
- Agentic AI CMM Standards Crosswalk — domain-by-standard anchor map (NIST AI RMF, ISO 42001, MITRE ATLAS, OWASP ASI/AIVSS, ZT4AI, CSA, EU AI Act incl. Annex IV, AIUC-1)
- Agentic AI CMM Measurement Protocol — three-stage assessor’s handbook (interview script, artifact checklist, scoring rubric)
- Agentic AI CMM Dependency Rules — effective-score scaffolding that replaces the single cumulative floor; v1 = 3 active cross-domain caps (D2→D5, D2→D7, D3→D4) + 6 candidate rules + promotion / revision protocol
Adjacent maturity frameworks:
- PwC Stage-Coverage Tiers — 4-archetype maturity model (Observer/Experimenter/Integrator/Pioneer) for GenAI-in-SDLC adoption breadth (not security maturity); orthogonal-axis to the Agentic AI Security CMM
Papers
See Papers Index.
- AI Security Standards in Q1 2026 — framework gap analysis: agentic threats outpace standards bodies; OSS controls (LlamaFirewall, AgentGateway) ahead of frameworks
- Securing the Autonomous Future — Insight Partners (2025); five-category agentic-AI security market map; agent identity architecture; NHI, MCP security, “UEBA for Agents” coining (origin of the colloquial term)
- Emerging Cybersecurity Practices for Agentic AI Applications — Goncharov (2026); OpenClaw-ecosystem analysis; seven security domains mapped to OWASP ASI Top 10
- Gartner Market Guide for Guardian Agents (Feb 2026) — Litan, Plummer et al.; defines guardian agent category; Sentinels/Operatives split; metagovernance; vendor segmentation
- Securing Your Agents — McIntyre (2026); 40-slide layered-defense playbook covering inputs, prompts, outputs, infra, and red-teaming
- AI Coding Agent Governance — Knostic (2026); governance ≠ security; four components (identity / scoping / approval / audit) and three-phase rollout; introduces shadow automation framing
- What Are Non-Human Identities? — Oasis Security (2026); NHI taxonomy (11 types), HR-vs-code-pace lifecycle mismatch, identity-credential coupling, six-pillar securing-NHI strategy
- AI Data Security — Knostic (2026); inference vs retrieval exposure, AI-UC, AI-SPM/DSPM stack, eight enterprise-AI risks, knowledge-layer governance
- Scaling Agentic AI: A Leadership Guide for CIOs — Gartner webinar (Gummer + Gulzar, May 2026); operating-model counterpart to the Guardian Agents Market Guide; introduces the AI Agent Layered Council and the human-parity-line crossing
- [[unprompted-conference-march-2026|[un]prompted Conference — AI Security Practitioner Conference (March 3–4, 2026)]] — full talks catalog with presenters/orgs/notable data points across ~52 talks; companion ranking page in Comparisons. (Source page header says “[un]prompted II” but body announces September 2026 as II — slug uses date to dodge the ambiguity.)
- Breaking the Lethal Trifecta (Without Ruining Your Agents) — Bullen, Stripe — first individual-talk ingest from [un]prompted: slides + transcript combined; introduces the lethal-bifecta, names smokescreen + toolshed; canonical practitioner worked example for lethal-trifecta containment.
- Capability-Based Authorization for AI Agents — Niyikiza, Tenuo — [un]prompted March 2026; paired slides + transcript; warrant primitive with 6 properties; monotonic attenuation (W₂⊆W₁⊆W₀); 4 deployment modes; 90%→0% multi-agent ASR on custom harness; partial closure of the PDP/PEP gap for delegation-aware actions.
- Securing Workspace GenAI at Google — Lidzborski — [un]prompted March 2026; three-year Google Workspace retrospective; introduces prompt-as-code structural framing, agency-gap / orchestration-hijacking / recursive-prompt-injection threat sub-classes, sentinel-tokens for prompt delimitation, “Architecting the Fortress” 4-layer blueprint, and Plan-Validate-Execute as canonical HITL pattern; architectural sibling to Bullen’s Stripe talk (Bullen: egress/tool-policy; Lidzborski: input/orchestration/output).
- Glass-Box Security — Hurd, Starseer — [un]prompted March 2026; mechanistic interpretability hooks (cosine similarity to known-concept directions, scalar projection for strength) added as a second detection layer below plaintext eBPF/regex; YARA-style semantic tripwires at the residual-stream level; pairs with mechanistic-interpretability-for-defense and glass-box-security concept pages.
- Guardrails Beyond Vibes — Zhang & Shah, Stripe — [un]prompted March 2026; second Stripe talk; offline-eval pipeline + LLM-as-Judge + golden-standard test cases; architecture-matched-to-task (sequential multi-agent for threat modeling, single-minimal-toolset for triage); AlphaEvolve evolutionary prompt search failed at Stripe’s cost frontier; pairs with Bullen (containment vs. quality — complete inner loop).
- Hooking Coding Agents with Cedar — Maisel, Sondera — [un]prompted March 2026; deterministic hook-based reference monitor for coding-agent lifecycle events (action / observation / control / state) routed through Cedar; YARA + IFC labels + safety model; orthogonal to Niyikiza (Niyikiza: delegation-time auth via warrants; Maisel: per-action enforcement for standalone agents).
- Building Secure Agentic Systems — McMillin, Dropbox — [un]prompted March 2026; 19-agent / 73-tool home-lab fleet uncovers two novel failure modes: cross-agent memory contamination via shared namespace (fix: class-name-keyed isolation), and the N-token attack-hiding window during context trimming (fix: tagged + pinned security events); first practitioner-derived concepts at the agent-fleet level.
- “Your Agent Works for Me Now” — Rehberger — [un]prompted March 2026; reframes prompt injection as “promptware” (multi-stage NL malware with persistence, C2, exfiltration, lateral movement); discloses delayed-tool-invocation bypass technique that defeats platform-level tool-confirmation by deferring activation to a later turn.
- 1.8M Prompts, 30 Alerts — Rittinghouse, Salesforce — [un]prompted March 2026; production-scale telemetry from Salesforce Agentforce: 12,000+ daily-active agents across 55,000 tenants reduced via three-level ensemble behavioral anomaly detection to ≤30 actionable daily alerts; introduces the prompt-volume-to-alert-ratio metric.
- Beyond the Chatbot — Smith & Sharma, Salesforce — [un]prompted March 2026 (
stub-summary, abstract-only); companion talk to Rittinghouse: the response half of the Agentforce SOC story; argues for a Polyphonic (Supervisor-Worker) architecture against monolithic black-box copilots; promote when slides / transcript captured. - AI Agents Are Here. So Are the Threats. — Chen & Lu, Unit 42 (May 2025) — first systematic empirical study of framework-agnostic agentic-AI vulnerabilities; 9 attack scenarios on functionally identical apps built on CrewAI + AutoGen (open-source reference impl on GitHub); 5 mitigation strategies (prompt hardening, content filtering, tool input sanitization, tool vulnerability scanning, code executor sandboxing); sister to unit-42-prompt-injection-observations production-telemetry piece.
- AWS Agentic AI Security Scoping Matrix — Brown & Saner, AWS Security Blog (Nov 2025) — source summary for the AWS framework; four-scope agency/autonomy ladder + six security dimensions; load-bearing contribution is the explicit definitional split between agency (scope of permitted actions) and autonomy (degree of independent decision-making). See the framework page for the full structural analysis and crosswalk to wiki ladders.
- Securing Agentic AI Systems — A Multilayer Security Framework — Arora & Hastings, arXiv (Dec 2025) — source summary for the MAAIS preprint; seven-layer architecture + CIAA augmentation + Design Science Research methodology + tactic-level MITRE ATLAS validation. Anchor citation for the wiki’s CIAA reference on CMM D1. Limitations: tactic-level (not technique-level) validation; no Lethal Trifecta / MCP / A2A / promptware treatment.
- Secure Agentic AI End-to-End — Vasu Jakkal, Microsoft Security Blog (March 2026) — pre-RSAC 2026 announcement of Microsoft’s full agentic-AI security portfolio across Entra / Defender / Purview / Sentinel / Security Copilot. Three-pillar framing (secure agents / secure foundations / defend with agents). Load-bearing announcements: Agent 365 GA May 1; Microsoft 365 E7 Frontier Suite SKU; Entra Internet Access Prompt Injection Protection (first major-vendor network-layer PI containment); Defender Predictive Shielding; updated ZT4AI reference architecture.
- Agentic SDLC in Practice — The Rise of Autonomous Software Delivery (PwC Middle East, 2026) — 82-page advisory report; 377-respondent survey across GCC+Jordan+Egypt; introduces the PwC Stage-Coverage Tiers maturity model (Observer/Experimenter/Integrator/Pioneer) and the forward Agentic SDLC operating model; cites METR 2025 RCT (16 experienced devs 19% slower with AI) as counter-evidence to vendor productivity claims; security is #1 barrier (37.7%); 84% report moderate-to-significant productivity and quality gains; 38% are Pioneers augmenting ≥6 of 7 SDLC stages; 75% plan to raise GenAI spend within 24 months; forecasts 2027 majority-Pioneer adoption.
- 2026 Agentic Coding Trends Report (Anthropic, early 2026) — 17-page vendor strategic forecast; eight trends across Foundation/Capability/Impact buckets; Trend 8 (“agentic coding improves security defenses — but also offensive uses”); Trend 4 establishes the collaboration paradox (60% AI usage / 0-20% fully delegated); Priority 4 calls for “embedding security architecture as a part of agentic system design from the earliest stages”; customer examples (Augment Code, Fountain, Rakuten, CRED, Legora, Cowork, TELUS, Zapier) anchor concrete adoption metrics.
- Introducing CodeMender — AI agent for code security (Google DeepMind, Oct 2025) — Google’s patching-side agent paired with Big Sleep; Gemini Deep Think reasoner + program-analysis toolbox (static + dynamic + diff testing + fuzzing + SMT solvers) + multi-agent critique/judge validation; 72 patches upstreamed to OSS in 6 months; libwebp
-fbounds-safetyannotation example renders entire vulnerability classes “unexploitable forever”; all patches human-reviewed. - From Naptime to Big Sleep (Google Project Zero, Oct 2024) — foundational paper for Google’s variant-analysis AI vulnerability-discovery agent; first AI-discovered real-world exploitable memory-safety bug (SQLite stack buffer underflow); Project Zero + DeepMind collaboration; predecessor to Big Sleep was Naptime which achieved SOTA on Meta’s CyberSecEval2.
- Project Glasswing — Securing Critical Software for the AI Era (Anthropic, May 2026) — coalition-organizing announcement; 12 named partners (AWS, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks) plus 40+ extended organizations; 4M OSS donations; Mythos NOT planned for GA (125 per M tokens for Glasswing participants only — contradicts XBOW’s “5× Opus at GA” claim); 27-year-old OpenBSD vuln, 16-year-old FFmpeg vuln (5M fuzzer hits without detection), autonomous Linux kernel privesc chain; Mythos raw CyberGym 83.1% — identifies the unnamed #2 on MDASH’s leaderboard.
- Defense at AI Speed — Microsoft’s MDASH (Microsoft Security Blog, May 2026) — defender-side companion to the XBOW/Mythos ingest; multi-model agentic scanning harness orchestrating 100+ specialized agents (Prepare/Scan/Validate/Dedup/Prove pipeline); 16 new CVEs in May 2026 Patch Tuesday; 88.45% on CyberGym public leaderboard (top score, ~5 points above #2); convergent architectural argument with XBOW: “the harness does the work, the model is one input.”
- Mythos for Offensive Security — XBOW’s Evaluation (XBOW Blog, May 2026) — first sourced anchor on the
ai-vuln-discoveryaxis; third-party evaluation of Anthropic’s preview-stage Mythos frontier model; 42-55% false-negative reduction vs Opus 4.6 on XBOW’s web exploit benchmark; canonical source for XBOW and Mythos entity pages. - Secure AI Framework Approach — Implementation Guide (Google, 2024) — practitioner companion to SAIF; four-step adoption methodology (Understand the use → Assemble the team → AI primer → Apply six elements) and the six core elements (“expand foundations / extend detection / automate / harmonize / adapt + feedback / contextualize risks”). Surfaces the AI shared-responsibility model (developer / deployer / user splits), the six decision domains for data governance (quality / security / architecture / metadata / lifecycle / storage), and the cross-functional SAIF stakeholder list at the implementation-team level (vs the Layered Council at the governance level).
- Microsoft SDL: Evolving Security Practices for an AI-Powered World — Zunger, Microsoft Security Blog (Feb 2026) — announces the explicit AI extension of Microsoft SDL; six SDL-for-AI focus areas mapping cleanly to six wiki CMM domains; six operating pillars including Cross-functional collaboration (Business Process + Application UX explicitly in scope); strategic preamble with substantive per-area technical guidance promised in follow-up posts; concrete vendor instance of the anchor + AI-overlay pattern recommended by the 2026 framework-stack thesis; “SDL is a way of working, not a checklist.”
Playbooks
See Playbooks Index.
- Assessor’s Quick Scorecard — Secure-SDLC and AI Practices for a Large Canadian Bank — ~10-page 2nd-party advisor instrument with ~65 questions across six sections (Secure-SDLC foundation / AI governance and model risk / Frontier-AI in CI/CD optional / Continuous pentesting and AI red teaming / Identity, least-agency, and supply chain / Observability and AI IR); regulatory anchors to OSFI B-13 (Tech & Cyber Risk Mgmt), OSFI E-23 (Model Risk 2027), OSFI B-10 (Third-Party), PIPEDA §10.1, and Canada’s Voluntary AI Code; scoring rubric Yes/Partial/No/N/A with L1-L5 section maturity tiers; engagement-tier = minimum-of-sections; findings priority pinned to regulatory and AI-safety-critical exposure
Concepts
See Concepts Index.
- Jason’s Mental Model — Security Architecture functional breakdown
- LLM-as-a-Judge — stub
- Evidence Centered Benchmark Design — stub
- CyberGym Benchmark — public benchmark of 1,507 real-world vulnerability reproduction tasks across 188 OSS-Fuzz projects; load-bearing third-party leaderboard for AI-driven vuln discovery; MDASH leads at 88.45%
- 0-20%) — Anthropic Societal Impacts finding: developers use AI in ~60% of work but can “fully delegate” only 0-20% of tasks; the implied 40-60% active-collaboration band is the largest single category of agentic AI usage; load-bearing argument for HITL as default operating mode, not just for irreversible actions
- Vibe Coding — Karpathy-coined (Feb 2025) term for generating/modifying code via natural-language intent rather than exact specifications; now a formal advisory category (PwC names “Vibe-coder” as an emerging role); operates in tension with the collaboration paradox — most useful for throw-away prototypes, bounded by Plan-Validate-Execute for production work
- METR 2025 RCT — load-bearing counter-evidence anchor for AI productivity claims: randomized controlled trial with 16 experienced OSS maintainers on their own repositories; enabling early-2025 AI tools made them ~19% slower; bounds vendor productivity claims at the experimental level; cited as Indicator 14 in PwC’s 2026 Agentic SDLC report
- Non-Human Identity (NHI) — machine credentials for AI agents; lifecycle, DSPM analogy, Credential Zero
- MCP Security — securing Model Context Protocol servers, proxies, AATO attack class
- Human-in-the-Loop (HITL) for Agentic AI — platform-enforced confirmation gate before high-impact agent actions; OWASP four-tier model (auto/notify/confirm/block); CSA ATF gates; Stripe’s residual ASR data
- Tool Poisoning and Rug-Pull Attacks — manipulation of tool definitions or post-trust replacement of tools; MCP CVE-prone surface; defended via fingerprinting, supply-chain scanning, confirmation gates
- Memory Poisoning (Agentic AI) — adversarial content injection into RAG, episodic memory, scratchpad, or state checkpoints; persistent attack surface (vs single-step prompt injection); defended via provenance, retrieval filtering, integrity monitoring, state rollback
- SPIRE — workload-identity standard (stub)
- Least Agency Principle — OWASP-sourced; autonomy as a security dimension alongside access
- Lethal Trifecta — Willison’s structural test: private-data + untrusted-content + external-comms in one agent
- Lethal Bifecta — Bullen-coined write-side analogue (untrusted-content + sensitive-action); the threat model behind tool-annotation review schemas
- CaMeL Pattern (Compartmentalized Machine Learning) — Google DeepMind (March 2025); privileged + quarantined LLM split; structured output channel prevents injected content from reaching the high-trust model; research-stage (no shipped production implementation as of May 2026); CMM D3/D4 L5 credit
- Indirect Prompt Injection — payload arrives via agent’s own retrieval; user never sees it
- Tool-Abuse Chains — single injection cascades into multi-tool exploitation
- Canary Tokens for LLMs — system-prompt leak detection trip-wire
- Three Retrieval Paths — vector / full-text / metadata; paths 2 and 3 are the practical risk
- Shadow Automation — agent-era equivalent of shadow IT; ungoverned agents accessing repos / prod / credentials at developer pace
- Decision Rights for AI Agents — governance counterpart to least privilege; documented authority + approver + justification + time bound per action class
- Identity-Credential Coupling — NHIs where the credential string IS the identity (SAS tokens, storage keys, PATs); rotation = identity rotation; credential proxy can’t separate what’s structurally inseparable
- Inference Exposure (and Retrieval Exposure) — paired AI-specific failure modes that bypass file/network access controls
- AI Usage Control (AI-UC) — UCON Authorizations + Obligations + Conditions evaluated at answer time; the layer beyond access control
- Shadow AI — unauthorized AI tools at work; 75% knowledge-worker GenAI usage, 78% BYOAI; Samsung incident is canonical
- Oversight Layer (PDP + PEP for Agentic AI) — architectural primary for the AI security supervision role; PDP / PEP / PIP / PAP zero-trust primitives; cross-walk to all alternative terms
- Guardian Agent — procurement-language synonym for the oversight layer (Gartner, Feb 2026)
- Sentinels and Operatives — runtime architectural split: Sentinels provide context (PIPs); Operatives intervene (PDP+PEP)
- AI Agent Catalog — mandatory inventory primitive (PIP); subject inventory for the oversight layer
- AI Agent Management Platform (AMP) — Gartner-defined vendor category for unified agent management
- AI Agent Layered Council — Gartner-coined cross-functional governance body for agentic AI: CIO co-leads with CFO/COO/GC/Procurement/CHRO, each with a named play
- Human Parity Line — Gartner’s measurement: AI’s blind-preference parity with industry professionals across 1,320 tasks / 42 roles; crossed in Dec 2025
- Agentic AI Threat Classes — 2026 Expansion — five threat classes the wiki’s existing taxonomy underdevelops (insider-with-AI-access, long-running APT, agent collusion, model-version regression, jurisdictional adversaries); closes peer-review-readiness §5
- Capability-Based Authorization — 60-year lineage (Dennis & Van Horn 1966 → Macaroons → UCAN/Biscuits → CaMeL → Tenuo Warrant); artifact-carries-policy model; structural answer to the AI Confused Deputy problem
- Tenuo Warrant — six-property signed capability artifact; W₁⊆W₀ monotonic attenuation; constraint language (basic logic / regex / glob / CEL); Map vs Territory lesson (4 CVEs); ≈55μs E2E auth; 4 deployment modes
- Monotonic Attenuation — delegation invariant: child capability ⊆ parent capability; contrapositive is the key safety property; prior art in Macaroons, UCAN, Biscuits, Tenuo
- Ambient vs Derived Authority — structural distinction: identity-based auth (full credentials at deploy-time, ambient) vs capability-based auth (task-scoped artifact, derived); AI Confused Deputy as the failure mode of ambient authority with agent LLMs
- Prompt as Code — Lidzborski’s structural framing: every input token is a potential instruction; LLMs lack an NX-bit equivalent for the prompt window; explains why filtering loses
- Agency Gap — non-deterministic disconnect between user intent and autonomous AI execution; “wrong John” failure mode; structural reason for HITL confirm tier
- Orchestration Hijacking — compromised LLM-as-planner manipulated by indirect injection; supports time-delayed and dormant triggers
- Recursive Prompt Injection — LLM-as-a-judge subject to same vulnerability as primary; semantic gaslighting attack
- Sentinel Tokens — prompt delimitation primitive; partial structural mitigation paired with adversarial fine-tuning
- Inline Gateway vs Runtime Instrumentation — architectural fork in the agentic-AI-security seed cohort (gateway camp vs runtime-instrumentation camp); mirrors the API-gateway-vs-APM split from the cloud-native era
- Glass-Box Security — Hurd-coined; defending AI agents from inside the model’s forward pass via mechanistic interpretability instead of black-box input/output filtering
- Mechanistic Interpretability for Defense — applied research direction: cosine similarity against known-concept directions + scalar projection for strength; YARA-style semantic tripwires at the residual-stream level
- Agent Memory Isolation — McMillin-derived; per-agent memory namespace keyed at the MCP server layer (outside LLM influence) to prevent cross-agent contamination through shared retrieval pools
- Context-Aware Trimming — McMillin-derived; tagging + pinning security-relevant events as exempt from context-window eviction so an attacker can’t burn through tokens to drop the audit trail
- Promptware — Rehberger-coined; structured multi-stage NL malware with persistence, C2, exfiltration, lateral movement — prompt injection is no longer a single-turn attack class
- Delayed Tool Invocation — Rehberger-disclosed bypass; defers tool activation to a later conversation turn to evade platform-level confirmation gates
- Behavioral Anomaly Detection for Agents — three-level ensemble (per-agent / per-tenant / global) profiling pattern; the structural alternative to LLM-as-judge for production-scale alert reduction
- Prompt-Volume-to-Alert Ratio — Rittinghouse-derived; production metric for AI-SOC tuning (Agentforce: 1.8M prompts → ≤30 alerts/day at 55K tenants); the AI-era counterpart to the SIEM signal-to-noise ratio
- Agent Commander Prompt (C2) — Rittinghouse-named; attacker-controlled prompt that issues commands to a compromised agent across sessions; AI-era analog of botnet C2
- Differential Privacy — ε-DP and DP-SGD as defensive primitives against model inversion and membership inference; canonical mathematical privacy guarantee for ML training and federated agent telemetry; CMM D6 L4/L5 control surface
- Model-Layer Attacks (Extraction, Inversion, Membership Inference) — three named attack classes targeting the deployed model rather than orchestration; MITRE ATLAS techniques
AML.T0024/T0044/T0048; defenses span DP, rate limiting, output randomization, query-pattern monitoring; the threat surface the wiki had under-treated relative to prompt-injection-class threats - Agent Availability Threats (Runaway, Recursion, Resource Exhaustion) — the Availability axis surfaced by the MAAIS CIAA augmentation; runtime budgets, recursion-depth limits, distributed cycle detection; the Lethal Trifecta is C+I, this is the parallel A treatment
- Operational XAI for Action Gating — distinct from mechanistic interpretability; runtime-emitted justifications gating high-impact actions before they execute; LLM-as-judge / Plan-Validate-Execute /
ToolAnnotationsjustifications as concrete patterns - Network-Layer Prompt Injection Containment — third architectural layer in the prompt-injection-containment stack (Layer 0: Network) below input-detection (Layer 1) and execution-containment (Layer 2); first shipped at major-vendor scale by Microsoft Entra Internet Access PI Protection (Mar 2026); operates outside the agent’s process boundary so applies even to compromised / shadow agents
Entities
See Entities Index.
Organizations
- Adobe — stub
- Anthropic — AI lab; Claude; CoSAI Premier Sponsor; Glasswing publisher (stub)
- CoSAI (org) — Coalition for Secure AI consortium
- CyberArk — Identity Security Platform vendor (PAM, Conjur secrets, Secure AI Agents); acquired by Palo Alto Networks in 2026
- Cloud Security Alliance — publisher of MAESTRO and Agentic AI Red Teaming Guide
- Gartner — analyst firm; defined AI TRiSM and Guardian Agents categories
- Glasswing — Anthropic vulnerability research / disclosure program
- Google — hyperscaler; SAIF, A2A, Google ADK; Anton Chuvakin
- Insight Partners — VC firm; published three-part agentic-AI security market map series
- ISO — standards body publishing ISO/IEC 42001
- Meta — stub
- Microsoft — hyperscaler; Entra Agent ID, Agent 365, Defender, Purview, Prompt Shields, RAI, ZT4AI, FIDES, PyRIT (stub)
- OSFI — Office of the Superintendent of Financial Institutions (Canada) — Canadian federal regulator and supervisor of federally-regulated financial institutions; issuer of Guidelines B-13 (Technology and Cyber Risk Management) and E-23 (2027) (Model Risk Management); stub
- AWS (Amazon Web Services) — hyperscaler; Cedar (PDP), Firecracker (sandbox), Nitro Enclaves (TEE), Bedrock, Q (stub)
- NVIDIA — AI infrastructure vendor; Garak, NeMo Guardrails, NeMo Jailbreak NIM (stub)
- CrowdStrike — endpoint detection / SIEM; Falcon AIDR cited at D7 L5 (stub)
- Datadog — observability / APM; AI Monitoring + LLM Observability cited at D9 (stub)
- Zenity — agent-governance for M365 / Copilot Studio / Power Platform (stub)
- MITRE — research org publishing ATLAS adversarial-ML taxonomy
- NIST — U.S. standards body publishing AI RMF and the 600-series profiles
- OpenAI — AI lab; CoSAI member (stub)
- OWASP — open-source security community; publisher of LLM Top 10, Agentic AI Top 10, AIVSS
- Palo Alto Networks — stub
- Stripe — Lethal Trifecta containment architecture; two [un]prompted talks ingested (Bullen containment, Zhang & Shah quality)
- Wiz — stub
- Dropbox — Brooks McMillin’s home-lab fleet talk source ([un]prompted March 2026); stub
- Salesforce — Agentforce platform; 55K-tenant scale telemetry from the Rittinghouse [un]prompted talk
- Sondera — coding-agent ABAC harness producer; Cedar policy reference monitor for coding agents; Maisel’s [un]prompted March 2026 talk
- Starseer — mechanistic-interpretability-for-defense research org (Carl Hurd’s affiliation at [un]prompted March 2026)
- Knostic — AI security vendor; knowledge-layer governance + coding-agent governance; producer of kirin
- Oasis Security — identity security vendor specializing in NHI management at enterprise scale
- Onyx Security — AI Control Plane vendor; producer of the Onyx Platform; Guardian Agent vendor category
- Glean — enterprise AI search vendor; canonical example of oversharing risk in M365/SaaS-connector retrieval (stub)
- Tenuo — Rust capability-warrant runtime (OSS); founded by Niki Aimable Niyikiza; 4 deployment modes (in-process / sidecar / gateway / MCP-proxy); 90%→0% multi-agent ASR on custom harness; 53/53 violations on 5,700 fuzz probes
- Snap — stub; tracked because Niki Aimable Niyikiza is Security Engineer there (Tenuo’s warrant is Tenuo’s primitive, not Snap’s)
- Apollo Research — UK AI-safety eval org; primary source for agent-agent steganographic collusion threat modeling
- UK AI Safety Institute (AISI) — UK government pre-deployment evaluator; cyber-task autonomy benchmarks; Frontier AI Trends Report
- CSET (Center for Security and Emerging Technology, Georgetown) — primary source for AI export controls, jurisdictional adversaries, regulatory leverage
- Stanford HAI — academic-grade AI Index Report (annual, methodology-disclosed, cross-year comparable); primary citation for AI adoption-rate triangulation
- METR — independent eval org; methodological foundation for long-task autonomy claims (the 7-month doubling underneath UK AISI’s 8-month cyber figure)
- World Economic Forum — Global Cybersecurity Outlook annual; senior-leader survey-grade data on AI risk
- ENISA — EU government cybersecurity body; Threat Landscape annual; non-US triangulation for breach-cost claims
Seed-stage agentic AI security startups (2025–2026 funding wave; see funding synthesis):
- Project Glasswing — coalition initiative led by Anthropic (May 2026); 12 named launch partners (AWS, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks) plus 40+ extended organizations; 4M OSS donations; 90-day public reporting commitment; the organizing anchor for the
ai-vuln-discoveryandai-in-sec-defenseaxes - Lumia Security — $18M seed Dec 2025 (Team8); Guardian-Agent class; ex-PerimeterX + Unit 8200; Adm. Mike Rogers advisory
- Trent AI — $13M seed Apr 2026 (LocalGlobe + CIC); London-based; multi-coordinated-agents lifecycle platform; ex-AWS team
- Runlayer — $11M seed Nov 2025 (Khosla + Felicis); MCP gateway; David Soria Parra (MCP creator) as advisor; 8 unicorn customers in 4 months
- General Analysis — $10M seed Apr 2026 (Altos + Menlo + YC); adversarial-testing/CART for agentic AI; same Havaei/Liu/Li team as the Claude→Stripe coupons exploit research
- Helmet Security — $9M seed Dec 2025 (SYN + WhiteRabbit); MCP discovery + monitoring + control; Fred Kneip (CyberGRX founder)
- Keycard — 30M Series A Oct 2025 (a16z + boldstart, then Acrew); identity for AI agents; ex-Manifold/Snyk + Auth0/Passport.js
- Capsule Security — $7M seed Apr 2026 (Lama + Forgepoint); runtime trust layer with explicit no proxy / no SDK positioning; ClawGuard OSS
- SplxAI — $7M seed Mar 2025 (LAUNCHub); CART-style red-teaming; acquired by Zscaler — first exit in the cohort
- XBOW — autonomous offensive-security platform; multi-model orchestration (Opus 4.7, Sonnet 4.6, Haiku 4.5, GPT 5.5, preview-stage Mythos); live-site web exploitation harness; canonical entry on the
ai-in-sec-offenseaxis (May 2026)
Products
- AgentCordon — self-hostable open-source Agentic IDP and credential broker (Rust, GPL-3.0); three-tier CLI/broker/server split; Cedar PDP + AES-256-GCM vault + MCP gateway + OAuth2 AS in one binary; OSS alternative to Conjur for self-hosted deployments
- AgentGateway — open-source agent / MCP gateway (stub)
- AutoGen — Microsoft OSS Python multi-agent framework (MIT); Swarm/AgentChat/Magentic patterns; one of two frameworks Unit 42 used to demonstrate framework-agnostic agentic vulnerabilities (May 2025)
- CrewAI — OSS Python multi-agent framework (MIT); role/goal/task model with hierarchical delegation; one of two frameworks Unit 42 used to demonstrate framework-agnostic agentic vulnerabilities (May 2025)
- Cedar — AWS OSS policy language (Apache 2.0, 2023); formal semantics + deny-by-default; Rust engine (sub-ms evaluation); primary reference implementation for the RA Control Plane PDP alongside OPA; AI governance tooling release March 2026
- Firecracker — AWS OSS MicroVM (Apache 2.0); KVM-backed hardware-level isolation; 125ms boot; primary reference for per-task agent sandboxing in the RA Runtime Plane; production-proven in AWS Lambda + Fargate
- Garak — NVIDIA’s open-source LLM vulnerability scanner; ~18+ probe categories; CMM D7 L4 probe library slot
- gVisor — Google open-source container sandbox (Apache 2.0); user-space kernel interposer via Sentry; OCI-native runsc runtime; Runtime plane sandbox alternative to Firecracker for Kubernetes deployments
- Kirin — Knostic’s coding-agent runtime security / governance enforcement product (Cursor, Copilot, IDE extensions)
- Lakera Guard — commercial AI security API; real-time prompt injection + jailbreak + PII detection; continuously updated from Gandalf attack intelligence; Runtime plane content safety
- LlamaFirewall — open-source AI guardrail framework (stub)
- MDASH — Microsoft Multi-Model Agentic Scanning Harness — defender-side agentic vulnerability discovery and remediation system; 100+ specialized agents (auditors / debaters / dedup / provers) in a five-stage pipeline (Prepare → Scan → Validate → Dedup → Prove); ensemble of frontier and distilled models with second-SOTA-counterpoint design; top of CyberGym leaderboard at 88.45%; limited private preview (May 2026)
- Microsoft Security Copilot — AI-augmented SOC product; M365 E5/E7; five Microsoft-built role-specialized agents (Security Analyst, Alert Triage, Conditional Access Optimization, Data Security Posture, Data Security Triage) + 15 Security Store partner agents; canonical defender-AI entry on the
ai-in-sec-defenseaxis - Microsoft Entra Agent ID + Agent 365 Registry — Microsoft per-agent identity and lifecycle management; GA May 1, 2026; primary enterprise COTS for M365/Azure organizations; Identity plane agent-lifecycle + action-to-identity tracing
- Miggo Security — ADR vendor extending to agentic AI; DeepTracing (runtime AI-BOM), behavioral drift detection, proof-of-guardrail attestation with AWS Nitro Enclaves
- Mindgard CART — commercial 24/7 Continuous Automated Red Teaming SaaS; CMM D7 L4 continuous CART slot; UK-spinout backed by .406 Ventures
- Big Sleep — Google Project Zero + DeepMind agent for variant-analysis vulnerability discovery; Project Naptime → Big Sleep lineage (June 2024 → Oct 2024); first AI to discover an exploitable memory-safety bug in real-world widely-used software (SQLite); July 2025 CVE-2025-6965 cited as first AI-foiled in-the-wild exploit; named in Glasswing as Google’s parallel AI-cyber tool
- CodeMender — Google DeepMind AI agent for code-security patching (counterpart to Big Sleep’s discovery); Gemini Deep Think reasoner + program-analysis toolbox + multi-agent critique/judge validation; 72 OSS security patches upstreamed in 6 months; libwebp
-fbounds-safetyannotations as proactive-rewrite example; all patches human-reviewed - Mythos — Anthropic preview-stage frontier model (~5× Opus pricing at GA); strongest sourced advance on source-code-driven vulnerability discovery (42–55% FN reduction vs Opus 4.6 per XBOW’s eval); deployed via direct partnership (XBOW offensive, Project Glasswing defensive)
- Okta for AI Agents — Okta’s purpose-built agent identity and lifecycle management platform; GA April 30, 2026; enterprise primary for Identity plane (agent-lifecycle + NHI governance)
- Onyx Platform — five-surface unified AI control plane (Observability / Security / Governance / Orchestration / ROI); Onyx Guardian Agent; positioned as single-pane-of-glass alternative to Wiz + Prisma + AgentGateway
- Rego (Open Policy Agent) — CNCF-graduated OSS policy engine (Apache 2.0); Datalog-inspired Rego language; Kubernetes-native; primary alternative to Cedar for orgs with existing OPA infrastructure (Conftest, Gatekeeper, Styra DAS ecosystem)
- Agent Guard) — incumbent enterprise vault for credential proxying; Secure AI Agents initiative + Agent Guard for STDIO-based MCP flows; CyberArk acquired by Palo Alto Networks in 2026
- Wiz AI-SPM — Wiz CNAPP’s AI Security Posture Management module; first-CNAPP-AI-SPM (2024); graph-based attack-path correlation; runtime AI agent monitoring (added 2025); part of Google Cloud Security
- Palo Alto Prisma AIRS — Palo Alto’s end-to-end AI security platform (runtime + posture + model security + red teaming); GA April 2025; 2.0 GA October 2025 with Protect AI integration
- Promptfoo — open-source LLM evaluation + red-teaming framework (now part of OpenAI); CI-gated regression suite; CMM D7 L4 regression slot; primary empirical source for model-version-degradation finding
- PyRIT — Microsoft AI Red Team’s open-source orchestration framework; multi-turn attack strategies (Crescendo, TAP, Skeleton Key); CMM D7 L4 orchestration slot
- Smokescreen — Stripe’s open-source SSRF / egress proxy; the network-side control point for Lethal Trifecta containment
- Toolshed — Stripe’s internal central MCP proxy / tool registry; PEP for tool-call policy via
ToolAnnotations - AgentDojo — peer-reviewed (NeurIPS) independent prompt-injection benchmark; Meta uses it to evaluate LlamaFirewall PromptGuard 2; the third-party comparator for vendor self-eval claims
- Salesforce Agentforce — Salesforce’s enterprise agent platform; production source for the Rittinghouse 1.8M-prompts / 30-alerts telemetry case study
People
- Aaron Brown — co-author (with Matt Saner) of the AWS Agentic AI Security Scoping Matrix (Nov 2025); stub
- Andrew Bullen — Head of AI Security at Stripe; coined the “Lethal Bifecta”; presented Stripe’s containment architecture at [un]prompted March 2026
- Avivah Litan — Distinguished VP Analyst at Gartner; primary author of the Guardian Agents Market Guide (Feb 2026)
- Ben Nassi — security researcher; lead author on “Invitation Is All You Need” Google Calendar / Gemini indirect-injection demonstration (cited in Lidzborski’s Workspace talk)
- Bill McIntyre — author of Securing Your Agents deck (AIE / RMAIIG, 2026)
- Bob Rudis — VP Data Science, GreyNoise Labs
- Brandon Gummer — Gartner VP Analyst; co-presenter of Scaling Agentic AI: A Leadership Guide for CIOs (May 2026); name spelling unverified
- Brooks McMillin — Dropbox engineer; presented home-lab 19-agent / 73-tool fleet study at [un]prompted March 2026; introduced agent-memory-isolation and context-aware-trimming patterns
- Carl Hurd — Starseer researcher; presented Glass-Box Security (mechanistic interpretability for defense) at [un]prompted March 2026
- Daniel Miessler — stub
- Daryl Plummer — Distinguished VP Analyst at Gartner; co-author of the Guardian Agents Market Guide
- Remy Gulzar — Gartner VP Analyst; co-presenter of Scaling Agentic AI: A Leadership Guide for CIOs (May 2026); name spelling unverified
- Dongdong Sun — Senior Staff ML Engineer, Palo Alto Networks
- Jeffrey Zhang — Stripe AI security engineer; co-presenter of Guardrails Beyond Vibes at [un]prompted March 2026 (with Sid Shah)
- Johann Rehberger — independent red-team researcher; “Month of AI Bugs” (Aug 2025); Embrace The Red; presented “Your Agent Works for Me Now” at [un]prompted March 2026 (introduced promptware and delayed-tool-invocation)
- John Hastings — co-author (with Sunil Arora) of the MAAIS arXiv preprint (Dec 2025); affiliations unverified; stub
- Matt Maisel — Sondera engineer; presented “Hooking Coding Agents with Cedar” at [un]prompted March 2026 (deterministic reference monitor for coding agents)
- Matt Rittinghouse — Salesforce engineer; co-presenter of 1.8M Prompts, 30 Alerts at [un]prompted March 2026
- Matt Saner — co-author (with Aaron Brown) of the AWS Agentic AI Security Scoping Matrix (Nov 2025); stub
- Millie Rittinghouse — Salesforce engineer; co-presenter of 1.8M Prompts, 30 Alerts at [un]prompted March 2026
- Mohamed Nabeel — Sr Principal Researcher, Palo Alto Networks
- Nicolas Lidzborski — Principal Software Engineer, Google Workspace security; ~3 years on GenAI security; speaker at [un]prompted March 2026
- Niki Aimable Niyikiza — Founder @ Tenuo; Security Engineer @ Snap; ~10 yrs infra security (Google, Datadog, Snap); presenter at [un]prompted March 2026; coined the valet-key / Map vs Territory framings for capability-based authorization
- Peter Smith — Director, Agentic SOC Product Management at Salesforce; co-presenter of Beyond the Chatbot at [un]prompted March 2026 (with Ravi Kiran Sharma)
- Ravi Kiran Sharma (RK) — Lead Security Engineer at Salesforce; co-presenter of Beyond the Chatbot at [un]prompted March 2026 (with Peter Smith)
- Sid Shah — Stripe AI security engineer; co-presenter of Guardrails Beyond Vibes at [un]prompted March 2026 (with Jeffrey Zhang)
- Simon Willison — independent researcher; coined the Lethal Trifecta (Jun 2025); simonwillison.net
- Sounil Yu — author of The Cyber Defense Matrix (Wiley, 2022); collaborator with Knostic on AI-era extensions of CDM (stub)
- Sunil Arora — co-author (with John Hastings) of the MAAIS arXiv preprint (Dec 2025); affiliations unverified; stub
- Taesoo Kim — lead author of Microsoft’s MDASH announcement (May 2026); represents/leads Microsoft’s Autonomous Code Security (ACS) team; prior systems-security and DARPA AI Cyber Challenge background (unverified)
- Vasu Jakkal — Corporate Vice President, Microsoft Security; lead byline on Microsoft Security Blog announcement posts (incl. Secure Agentic AI End-to-End, March 2026)
- Yonatan Zunger — Microsoft Security leader; bylined author of the 2026-02-03 Microsoft SDL AI-extension announcement; stub (bio pending primary-source confirmation)
- Apostol Vassilev — NIST Computer Security Division researcher; co-author of SP 800-218A and lead author of NIST AI 100-2e2023 Adversarial Machine Learning: A Taxonomy and Terminology (the federal-anchor adversarial-ML reference cited throughout the wiki); stub
Thesis Pages
See Thesis Index.
- Security Controls for AI Stacks — six-layer inventory mapping existing controls to layers; flags gaps (egress, AI-BOM, model-layer)
- Secure-SDLC Framework Stack for 2026 — Is NIST SSDF + OWASP SAMM Enough? — evaluates the “anchor policy to SSDF, assess maturity via SAMM” claim; structurally correct foundation, materially incomplete for AI-augmented threat model and AI-component governance; recommended 6-layer stack adds AI overlay (NIST AI RMF / Agentic AI Security CMM / ISO 42001), supply-chain layer (SLSA + CycloneDX), benchmark (BSIMM), and operational alignment (NIST CSF 2.0)
Incidents
See Incidents Index.
- ClawHavoc — Q1 2026 supply-chain attack on agentic skill marketplace
- Claude → Stripe coupons via iMessage metadata spoofing — Jul 2025; multi-MCP context-pollution exploit (Havaei/Liu/Li, General Analysis)
- Clinejection — AI-attacks-AI via GitHub issue title prompt injection
- Cursor npm credential stealer — May 2025; 3,200+ users; sw-cur / sw-cur1 / aiide-cur; macOS-targeted Socket disclosure
- VS Code AI output validation bypass — Nov 2025; IDE-level prompt-injection class (3 sub-vulns: fetch_webpage trust check, Mermaid HTML img, edit_notebook silent overwrite)
- GTG-1002 — first publicly disclosed AI-orchestrated cyber espionage campaign (PRC-nexus, Sep 2025; ~30 targets; Anthropic Nov 2025 disclosure)
- Jules AI kill chain — Aug 2025; full 5-stage compromise of Google’s coding agent (Rehberger)
- LiteLLM supply chain compromise — Google ADK dependency
- MCP CVEs Q1 2026 — 30+ CVEs; 82% path-traversal
- Meta Sev 1 agent breach — autonomous agent flawed advice → proprietary code exposure
- Month of AI Bugs — Aug 2025; coordinated disclosure series across every major frontier model and agentic dev kit
- SANDWORM_MODE npm worm — toolchain poisoning via MCP injection
- Slack AI private-channel exfiltration — Aug 2024; canonical Lethal Trifecta case via indirect prompt injection (PromptArmor)
- Unit 42 in-the-wild prompt injection observations — first production-telemetry confirmation
Gaps and Open Questions
See Gaps Index.
- Standards Validation Methodology — Sourcing, Falsifiability, and the Audit Backlog — protocol for primary-source-anchored framework pages, clause-level coverage matrices, falsifiable absence claims, adversarial second pass; lists 11-standard P1/P2/P3 audit backlog (~47 hours of work)
- Validation: Agentic AI CMM vs Widely Adopted Standards — first-pass synthesis (2026-04-30); superseded by the methodology above for absence-claim load-bearing evidence; stays as historical record
- Agentic AI Security RA — Open Implementation Questions — six tradeoffs the RA deliberately leaves open
- PEP for Non-Tool-Mediated Agent Actions — coverage hole for Claude-Code-style deep agents; tool-call mediation isn’t the right primitive when the agent writes its own code
- Source Triangulation Audit 2026-05-02 — per-claim corroboration of 8 load-bearing wiki claims against academic + government-survey sources; closes peer-review-readiness honorable-mention #1 (“statistics drawn from a narrow source set”)
- CMM Calibration Stress Test 2026 — L4→L5 jump and cumulative-floor rule analyzed against 5 realistic org archetypes; recommends L5/L5+ split (stable Optimizing vs research-stage Leading Edge) and per-domain matrix as primary report view; closes peer-review-readiness §3
- Stub Backlog 2026-05-02 — operational tracker for
status:stub/status:seedpages; 14 stubs + 25 non-meta seeds triaged into P1 (5 load-bearing) / P2 (10 active citations) / P3 (rest, intentionally light)
Comparisons
See Comparisons Index.
- Cybersecurity CMM Exemplars — design lessons from CMMI / BSIMM / OWASP SAMM / CMMC 2.0 / NIST CSF 2.0 for any new AI-security CMM
- [[unprompted-march-2026-talks-vs-ra-cmm|[un]prompted March 2026 Talks vs RA + CMM]] — Tier 1/2/3 relevance ranking of [un]prompted (March 3–4, 2026) conference talks against the RA planes and CMM domains
- Wiki Novelty and Counter-Arguments — 2026 — what the wiki actually contributes vs OWASP/NIST/Gartner/CSA + per-thesis competing-view callouts (platform-vs-prompt, Gartner 50% elimination, Lethal Trifecta unconditional, floor rule, eval-harness absorber, UEBA-for-agents); 10 unresolved contests logged; closes peer-review-readiness §7
- Agentic AI Security Seed Funding — May 2025 to May 2026 — top 8 seed rounds (~$85M total) ranked, mapped to RA planes + CMM domains; gateway-vs-runtime-instrumentation architectural fault line; D6 absence and CART-consolidation pattern; commercial reference-implementation deltas filed
- ASL, and OWASP Don’t Share Axes — scoping analysis for the parked PwC/MS/Anthropic/OWASP comparison candidate; documents the three-categorical-groups finding (org-program maturity vs model-capability tier vs threat-coverage catalog) and why a single-axis comparison would be misleading
Folds
- Fold k4 — 2026-05-04 to 2026-05-07 — n16 — extractive rollup of 16 log entries; dominant themes: standards-anchoring discipline, CMM consolidation toward a single canonical home, lint-driven backfill paired with tooling tightening.
Recent Activity
- 2026-05-07: lint pass + 28-item full cleanup. 15 new entity stubs (Airbnb, Block, Elastic, GreyNoise, Intel, Netflix, Perplexity, Snowflake, Sysdig for Tier 1; Alpitronic, SANS Institute, HiddenLayer, Protect AI, Team8 for Tier 2; Cursor product). Plain-text rewrites for two stale wikilinks. Three loose pages cross-linked. Source-provenance cleared on starseer + toolshed after extending
scripts/lint-sources.pyto acceptno_public_url:for entities. One dead link remaining is the intentionalstandards-review-nist-ai-rmf-2026-Q3worked-example placeholder. See lint-report-2026-05-07. Vault: 272 → 288 pages. - 2026-05-07: manifest cleanup + Beyond the Chatbot stub — registered three previously-uningested raw talks (beyond-the-chatbot, securing-your-agents-2026-04-30, unprompted-conference-talks-mar-2026). Two of the three already had comprehensive wiki pages from earlier passes; only the Salesforce Beyond the Chatbot talk was unrepresented. Created talk page (
stub-summary— abstract-only; promote when slides/transcript captured), Peter Smith entity, Ravi Kiran Sharma entity. Updated Salesforce org (wikilinked the new talk page + speakers); updated conference catalog (wikilinked the Beyond-the-Chatbot row + added the talks-list raw source to itssources:list). Polyphonic (Supervisor-Worker) architecture flagged as the talk’s load-bearing claim, paired with Rittinghouse as the response/detection halves of the same Agentforce SOC story. See log. - 2026-05-03: batch-ingested 6 [un]prompted talks (parallel agents). Talks: Hurd, Zhang & Shah, Maisel, McMillin, Rehberger, Rittinghouse. 27 new pages: 6 talk summaries, 7 person entities (Hurd, Maisel, McMillin, Zhang, Shah, M. Rittinghouse, M. Rittinghouse), 4 organizations (Sondera, Dropbox, Salesforce, Starseer), 1 product (Salesforce Agentforce), and 9 concepts (glass-box-security, mechanistic-interpretability-for-defense, agent-memory-isolation, context-aware-trimming, promptware, delayed-tool-invocation, behavioral-anomaly-detection-for-agents, prompt-volume-to-alert-ratio, agent-commander-prompt-c2). Updated Stripe org (second talk landed), Rehberger entity, LLM-as-a-Judge (stub→developing from Stripe production case), Cedar (stub→developing from Sondera harness), Ben Nassi, conference catalog, CMM ranking, Bullen talk (companion link), PEP gap, agent-observability practice, inline-gateway-vs-runtime-instrumentation. Hot-cache “Sondera Maisel” name conflation corrected — actual org is Sondera, presenter is Matt Maisel. Closes Tier-1 talk pursuit list. See log.
- 2026-05-03: autoresearch on agentic-AI-security seed funding (May 2025–May 2026). 10 new pages: synthesis, 8 organization entities (Lumia, Trent AI, runlayer, General Analysis, Helmet, keycard, Capsule, SplxAI), and 1 architectural concept (Inline Gateway vs Runtime Instrumentation). Top 8 seeds total ~85M. The 6-plane RA holds — every funded startup maps cleanly. **D6 (Data, Memory & RAG) has zero seed-stage agentic-specific entrants** — open question filed. Architectural fault line: ~20M seed in inline gateway camp (runlayer / helmet-security / agentgateway) vs ~$7M in runtime instrumentation (capsule-security explicit anti-proxy positioning); mirrors the API-gateway-vs-APM split. Guardian-Agent / cross-plane category overcrowded (Onyx + Lumia + Trent AI). CART consolidates fast (SplxAI → Zscaler; Promptfoo → OpenAI). See log.
- 2026-05-03: ingested Nicolas Lidzborski — “Securing Workspace GenAI at Google” ([un]prompted March 4 2026). 9 new pages: talk summary, speaker, Ben Nassi stub (calendar-invite attack author), 5 concepts (prompt-as-code, agency-gap, orchestration-hijacking, recursive-prompt-injection, sentinel-tokens) and 1 practice (plan-validate-execute). Architectural sibling to Bullen’s Stripe talk — covers input/orchestration/output side; Bullen covers egress/tool-policy. Updated lethal-trifecta (added 7th containment lever — layered structural defenses), indirect-prompt-injection (added Nassi attack + prompt-as-code framing), hitl (added Plan-Validate-Execute as canonical Google pattern), google org page. Slides PDF inaccessible in this session — flagged for re-ingest when PDF tooling available; one transcription gap flagged for clarification (“Consecar” — likely Workspace-internal framework name misheard). See log.
- 2026-05-03: Commercial AI security vendor pages — created CyberArk Conjur (with CyberArk org page), Wiz AI-SPM (Wiz org promoted from stub), Palo Alto Prisma AIRS (Palo Alto org promoted from stub), Onyx Platform (with Onyx Security org page). Closed plain-text vendor references in RA Identity, Runtime, and Observability planes. Source: Onyx product page clipped to
.raw/articles/onyx-platform-secure-ai-control-plane-2026-05-03.md(vendor marketing — capability claims caveated). - 2026-05-03: RA capability coverage audit — every plane-table capability now has either a wikilinked concept or product page. Created 8 new pages: 3 capability concepts (HITL, Tool Poisoning, Memory Poisoning) and 5 reference-implementation products (Okta for AI Agents, Microsoft Entra Agent ID, Miggo Security, gVisor, Lakera Guard). RA plane tables updated with corresponding wikilinks. Closes the lint-flagged P1 stubs (Okta, Entra) and the
[[hitl]]/[[tool-poisoning]]/[[memory-poisoning]]capability gaps. - 2026-05-03: RA structural improvements — Mermaid block-beta diagram updated to capability names (not product names); OpenClaw-era tools (SecureClaw, Brain Git, Aguara Watch, RAGShield, TrustRAG) reclassified to new
Exploratorytype; new §Prior Work section comparing the RA against Microsoft ZT4AI, Azure OpenAI Reference Architecture, Microsoft MCRA, AWS Well-Architected GenAI Lens, Google Cloud AI Security Foundations, and CSA MAESTRO. - 2026-05-03: RA/CMM reference implementation audit — added
Typecolumn (OSS / COTS / Std / Research / Infra / Concept) to all six RA plane tables; updated CMM tooling map to 4-column format separating Standards/Specs from OSS tools from COTS/SaaS; added two recommended stacks to RA §Mapping to deployment shapes (Enterprise stack + small-team stack); created 5 priority stub pages: Cedar, Rego, OpenTelemetry gen_ai.*, Firecracker, CaMeL pattern. See log. - 2026-05-03: ingested Niki Aimable Niyikiza — “Capability-Based Authorization for AI Agents” ([un]prompted March 2026) as paired slides + transcript. New pages: Niyikiza talk summary, Capability-Based Authorization (60-year lineage, artifact-carries-policy model), Tenuo Warrant (6 properties, W₁⊆W₀, Map vs Territory + 4 CVEs), Monotonic Attenuation, Ambient vs Derived Authority, Niki Aimable Niyikiza, Tenuo, Snap (stub). Updated: Lethal Trifecta (warrant-based containment added as lever #6), PEP gap (delegation-aware angle partially closed), mapping/conference pages. Speaker attribution corrected: Niyikiza is Founder @ Tenuo (not “Snap UCAN warrants”). See log.
- 2026-05-03: ingested four primary-source incidents cited in Bullen’s slide 4 — closed the tracking gap flagged in the prior hot cache. New pages: VS Code), Claude → Stripe coupons via iMessage, Cursor npm credential stealer, Slack AI private-channel exfiltration. Sources: NVD CVE record + GitHub Security Lab blog (supplementary), General Analysis exploit research, Socket Threat Research, original PromptArmor disclosure. Date corrections: Cursor filed under primary 2025-05-07 (not Medium-repost 2025-05-11); Slack AI filed under canonical 2024-08-20 (not the 2025 visual grouping in Bullen’s deck). NVD record for the CVE is one sentence and offers no CVSS at fetch time; deeper attack mechanism inferred from the GitHub Security Blog. See log.
- 2026-05-02: ingested Andrew Bullen — “Breaking the Lethal Trifecta” ([un]prompted, March 4, 2026) as paired slides + transcript (first dual-source talk ingest in this wiki). New pages: breaking-the-lethal-trifecta-bullen-talk (full summary), lethal-bifecta (Bullen-coined write-side analogue), andrew-bullen (people stub), toolshed + smokescreen (product stubs). Updated lethal-trifecta (added Stripe worked example), stripe (promoted from stub). Slide-only contributions: ASR numbers (1.5–6.7%), 4 incident headlines, ToolAnnotations API. Transcript-only contributions: Toolshed/Smokescreen names, “Lethal Bifecta” coinage, deep-agent WIP caveat. See log.
- 2026-05-02: ingested [un]prompted Conference (March 3–4, 2026) agenda — full catalog of ~52 talks with presenters/orgs/notable data points; companion Tier 1/2/3 relevance ranking against the RA planes and CMM domains. Headline data points captured: AISLE 12 OpenSSL 0-days (3 hidden 20+ years), FENRIR 100+ vulns/21 CVEs, Sysdig 8-min AWS escalation + EtherRAT, Salesforce 1.8M→<30 alerts, Parseltongue 17,000 prompts × 100+ obfuscations × 9 models, Anthropic GTG-1002 (80–90% autonomous adversaries). New pages unprompted-conference-march-2026, unprompted-march-2026-talks-vs-ra-cmm. (Source page header said “[un]prompted II” but body announces Sept 2026 as II — date-based slug used.) See log.
- 2026-05-01: ingested Knostic’s AI Data Security blog — added inference/retrieval exposure framing, AI-UC (UCON for AI), AI-SPM and DSPM as paired posture practices, oversharing-controls practice, Shadow AI concept; new framework pages for Cyber Defense Matrix and Gartner AI TRiSM (stubs); new entities Sounil Yu, Glean. See log.
- 2026-04-30: ingested Oasis Security’s What Are Non-Human Identities? — 5 sharpenings applied to identity/NHI plane (identity-credential coupling concept; D2 L3 code-pace lifecycle; D2 L4 coupled-credential migration plan; D9 L4 dependency mapping; NHI taxonomy + scale evidence triangulation). New pages oasis-what-are-non-human-identities, oasis-security, identity-credential-coupling. See log.
- 2026-04-30: ingested Knostic’s AI Coding Agent Governance — 5 sharpenings applied to the canonical CMM and architecture (governance≠security framing, decision-rights matrix at D1 L3, sample audit-log schema at D7 L3, time-bounded elevation at D3 L4, coding-agent archetype evidence rubric). New concepts shadow-automation and decision-rights; new entities knostic and kirin. See log.
- 2026-04-30: applied all 5 validation recommendations — softened L5 product dependencies, added D9 Operations & Human Factors as 9th domain, made ID-tagged evidence (ASI/AIVSS/ATLAS) a global L3+ rule, built agentic-ai-security-cmm-crosswalk standards-anchor map, built agentic-ai-security-cmm-measurement-protocol three-stage assessor handbook. Validation page closed. See log.
- 2026-04-30: autoresearch session — produced agentic-ai-security-reference-architecture (six-plane practical RA), agentic-ai-security-cmm-2026 (initially 5×8 cumulative CMM, now 5×9 post-validation), cybersecurity-cmms-exemplars (design lessons), and agentic-cmm-vs-standards-validation (independent reviewer agent vs 11 standards). See log.
- 2026-04-30: ingested Securing Your Agents talk (Bill McIntyre, 2026) — added Lethal Trifecta, indirect injection, three retrieval paths, RAG hardening, system prompt architecture, Jules kill chain, Month of AI Bugs. See log.
- 2026-04-30: ingested 3 anchor papers in parallel — When Marketing Fails: AI SOC, AI Security Standards in Q1 2026, Securing the Autonomous Future. See log.
- 2026-04-30: legacy migration completed.
- 2026-04-30: vault scaffolded (Mode E + framework focus).