Agentic AI Security Capability Maturity Model — A 2026 Practical Proposal
A practical, evidence-based Capability Maturity Model for agentic AI security, designed in May 2026 to apply the design lessons distilled from CMMI, BSIMM, OWASP SAMM, CMMC 2.0, and NIST CSF 2.0 (see Cybersecurity Capability Maturity Models — Exemplars and Design Lessons for the per-exemplar treatment) to the threat surface and control stack documented in the Agentic AI Security Reference Architecture (2026).
The model is descriptive at Levels 1–3 (controls observed in production at well-run organizations), prescriptive at Level 4 (controls a mature program operates), and achievable-today at Level 5 (capabilities shipping as products or specifications as of May 2026 — Microsoft Agent 365, Okta for AI Agents, AgentGateway-LF, LlamaFirewall, AIUC-1 certification, Miggo DeepTracing, Tenuo Warrants — though integration across all nine domains remains rare). Research-stage and unshipped capabilities (TEE-backed guardrail attestation, multi-agent cascade-detection rule libraries, CaMeL privileged / quarantined LLM split, cross-vendor AI-BOM federation, named standards contribution) are sequestered into a separate L5+ Leading Edge tier — explicitly aspirational and not required for L5. The L5 / L5+ split was filed in CMM Calibration Stress Test (2026-05-02) and adopted in this revision so that L5 is achievable today using shipping products.
Foundational distinction: governance ≠ security
The CMM measures both security (preventing harm) and governance (defining authority and accountability); the two are not interchangeable. Security controls (firewalls, EDR, prompt filters, sandboxes, credential proxies) prevent or contain harm. Governance defines who has the authority to act, under what justification, with what oversight, and with what record. An organization can be at L4 in security controls (D2 / D4 / D5 / D8) and still be at L1 in governance (D1 / D3 / D9) — and vice versa. Both must climb together. This distinction is sharpened in AI Coding Agent Governance (Knostic, 2025–2026) and operationalized via the Decision Rights for AI Agents concept; it is what makes “shadow automation” (Shadow Automation) a structurally different risk from “shadow IT.”
What this CMM is and is not
Is:
- A self-assessment instrument for CISOs, AI platform leads, and internal auditors.
- A cumulative maturity ladder across 9 domains, with dependency-resolved effective-score aggregation.
- An overlay on existing standards (NIST AI RMF, ISO 42001, OWASP ASI, MITRE ATLAS, CoSAI Principles, Microsoft ZT4AI, CSA Agentic Trust Framework, AIUC-1, EU AI Act).
Is not:
- A certification program. Certification belongs to ISO 42001 and AIUC-1; this is a measurement scaffold.
- A replacement for risk assessment. The CMM measures capability; risk assessment measures exposure.
- A vendor-neutral promise. Vendors and OSS projects are named where load-bearing at a given level. The intent is concreteness, not endorsement.
Five levels + a leading-edge tier (cumulative)
block-beta columns 1 L5p["L5+ Leading Edge<br/>research-stage / standards contribution<br/>(aspirational, not required)"] L5["L5 Optimizing<br/>platform-level enforcement (achievable today)"] L4["L4 Managed<br/>quantitative, continuous"] L3["L3 Defined<br/>org-wide standardization"] L2["L2 Developing<br/>policy + inventory"] L1["L1 Initial<br/>ad hoc"] classDef l1 fill:#ffe8e8,stroke:#dc3545,color:#000 classDef l2 fill:#fff4d6,stroke:#fd7e14,color:#000 classDef l3 fill:#fff9b8,stroke:#cc9900,color:#000 classDef l4 fill:#d4f5d4,stroke:#198754,color:#000 classDef l5 fill:#b8e4ff,stroke:#0d6efd,color:#000 classDef l5p fill:#e2d5f3,stroke:#6f42c1,color:#000 class L1 l1 class L2 l2 class L3 l3 class L4 l4 class L5 l5 class L5p l5p
Cumulative semantics (CMMC lesson, modified): Level N requires every Level N–1 control plus the new criteria at Level N. An organization’s overall rating is reported as a per-domain matrix; aggregation uses dependency-resolved effective scores rather than a single floor. A domain’s effective score = min(its raw score, the raw scores of its upstream-dependency domains under the active rule set). The active rule set is small and conservative — see Effective-Score Dependency Rules (v1 = 3 rules: D2→D5, D2→D7, D3→D4, all anchored to lethal-trifecta and Sondera/AgentCordon practitioner evidence). The headline becomes a three-number summary: typical (median effective) / weakest (min effective, with cap source labeled) / strongest (max raw, labeled).
This replaces the prior single-floor rule (imported from CMMC 2.0). The floor was misreporting 3 of 5 realistic archetypes per the stress test (Stripe-style architectural containment, Microsoft Agent 365-driven enterprises, resource-constrained startups). Effective-score scoring captures cross-domain attack-path failures substantively (e.g. weak D2 caps D5 because per-agent egress can’t be enforced without per-agent identity) rather than punitively (e.g. weak D9 ops lag dragging D2 identity controls down). Cherry-picking is now prevented by mandatory matrix disclosure + strategic-rationale field, not by mathematical aggregation. The dependency-rule registry is scaffolding — designed to grow with new attack-path evidence and practitioner architectures via the documented promotion protocol.
L5 vs L5+ semantics. L5 is a maturity tier: every L5 criterion in this CMM points to a shipping product, an open-source project at v1.0+, or a documented capability deployable with currently available components. L5+ is a leading-edge tier: it requires L5 across all 9 domains plus research-stage capabilities and active named contribution to one or more standards bodies. L5 is the bar a sufficiently resourced 2026 program can clear; L5+ is the bar a frontier-lab or research-shop program clears. Per-domain matrix view (in the measurement protocol) reports both.
Global evidence rule (applies at L3 and above). All findings, gaps, eval results, and incident artifacts MUST be tagged with the standards-anchor IDs they relate to:
- OWASP Agentic AI Top 10 —
ASI01–ASI10(the agentic risk taxonomy) - OWASP LLM Top 10 (2025) —
LLM01:2025–LLM10:2025(still apply to non-agentic and agent-as-LLM surfaces) - OWASP AIVSS v0.8 — full vector with amplification factors (Autonomy Level, Tool Use Scope, Multi-Agent Interactions, Non-Determinism, Self-Modification)
- MITRE ATLAS v5.4.0+ —
AML.T####techniques andAML.M####mitigations - NIST SP 800-53 control IDs (via NIST IR 8605A COSAiS overlay) where compliance evidence is needed
- For incidents: CVE IDs and MCP CVEs Q1 2026-class references
Without ID tagging, a finding is L2-grade evidence at best. ID-tagging is the boundary between a CMM that maps to standards and one that operates on them; it also enables downstream automation (machine-checkable findings, cross-domain query, longitudinal trend analysis).
| Level | Name | Definition | Auditor evidence requirement |
|---|---|---|---|
| 1 | Initial | Reactive, ad hoc. AI agents in production with no inventory, no identity, no platform-level controls. | None / point-in-time interview |
| 2 | Developing | Written AI security policy. Manual agent inventory. Some prompt-level guardrails. Identity is delegated through human user only. | Policy doc + spreadsheet inventory + sample agent design review |
| 3 | Defined | Standardized org-wide. Every agent has its own identity. Platform-level hooks intercept tool calls. AI-BOM exists for production agents. AI-specific incident response playbook documented. | Identity graph for all agents + Cedar/OPA policy repo + AI-BOM artifact + IR runbook |
| 4 | Managed | Quantitative metrics tracked continuously. Agent behavioral monitoring detects drift. Red-team eval program runs ≥quarterly. Credential proxy in use. Sandbox per high-risk task. | Dashboard with KPIs + red-team report + cred-proxy traffic logs + sandbox config |
| 5 | Optimizing | All controls reachable with shipping products as of May 2026. Platform-level enforcement everywhere across all 9 domains. AIUC-1 certified against the most recent quarterly refresh OR ISO/IEC 42001 certified. Real-time AI-BOM (Miggo DeepTracing or equivalent shipping product). Capability tokens / Warrants per task. Mesh AgentGateway sidecar per agent. ≥2 quarters stable L4 operation. Bus-factor ≥2 with documented continuity test. | Per-domain matrix L5 across all 9 domains + most-recent cert dated within last quarter + ≥2-quarter L4 history + continuity-test report |
| 5+ | Leading Edge | All L5, plus research-stage / leading-edge primitives in production: cryptographic guardrail attestation in a TEE (Nitro Enclaves-class); CaMeL-style privileged/quarantined LLM split for trifecta-positive workloads; cascade-detection rule library with tuned thresholds for ASI07/08/10 multi-agent risk; cross-vendor AI-BOM federation with reconciliation; sigstore-for-MCP cross-tenant signing; active named contributor to one or more of CoSAI / OWASP / AIVSS / NIST CAISI / OASIS / Linux Foundation AAIF AI WGs (PR / RFC / spec authorship, not just membership). | TEE attestation logs + cascade-rule registry with thresholds + cross-vendor AI-BOM reconciliation report + named contributor list with PR/RFC/spec links |
L4 → L5 is a campaign, not a step
Before claiming L5, the program MUST show: (a) ≥2 quarters of stable L4 operation across all 9 domains (no regression in the per-domain matrix); (b) AIUC-1 readiness assessment scheduled with an accredited auditor (Schellman or equivalent); (c) bus-factor ≥2 with a documented continuity test (anti-pattern I3 recovery); (d) gap-closure plan from the floor-domain to L5. This gate applies in addition to per-domain L5 criteria — meeting every per-domain L5 row without the gate evidence is L4-stable, not L5. Adopted from stress-test §Change 5.
Nine domains
The CMM uses 9 domains, derived from the 6 reference-architecture planes plus 3 cross-cutting concerns (governance, supply chain, and operations/human factors). The 9-domain breakdown sharpens focus on agentic-specific controls and adds a domain for the operational and human-factors gaps that no current standard covers (per the validation page §3).
Cross-cutting domains (D1, D8, D9) wrap the per-plane domains as bands top and bottom. Per-plane domains (D2–D7) sit in a single row matching the RA’s plane order, with the same XACML / NIST SP 800-162 §2.2 four-role color palette (PIP blue, PDP yellow, PEP red, mixed purple, cross-cutting green).
block-beta columns 6 D1["D1 Governance & Accountability"]:6 D2["D2 Identity & Authorization"] D3["D3 Control & Least-Agency"] D4["D4 Runtime & Guardrails"] D5["D5 Egress & Network"] D6["D6 Data, Memory & RAG"] D7["D7 Observability & Detection"] D8["D8 Supply Chain & AI-BOM"]:6 D9["D9 Operations & Human Factors"]:6 classDef pip fill:#cfe2ff,stroke:#0d6efd,color:#000 classDef pdp fill:#fff3cd,stroke:#fd7e14,color:#000 classDef pep fill:#f8d7da,stroke:#dc3545,color:#000 classDef mixed fill:#e2d5f3,stroke:#6f42c1,color:#000 classDef cross fill:#d1e7dd,stroke:#198754,color:#000 class D1 cross class D2 pip class D3 pdp class D4 pep class D5 pep class D6 mixed class D7 pip class D8 cross class D9 cross
D1. Governance & Accountability
Accountability, authority, and auditable record-keeping for agent behavior — spanning AI security policy, executive ownership, the agent / NHI inventory, decision-rights matrices, and certification readiness. Accountability is treated as a first-class security principle alongside Confidentiality, Integrity, and Availability — the CIAA augmentation of the classical CIA triad introduced by Arora & Hastings (MAAIS, 2025) for agentic systems.
Maps to: NIST AI RMF Govern, ISO/IEC 42001 §5–§9, EU AI Act Art. 9 risk management, CoSAI Shared Accountability principle, MAAIS Layer 5 (Accountability and Trustworthiness), Operational XAI for Action Gating (justification-capture as the runtime accountability artifact).
| Level | Criteria | Evidence |
|---|---|---|
| 1 | No AI governance role; agents deployed without security review. | none |
| 2 | Named AI security owner (CISO designate); AI use policy published; agent risk tier classification scheme exists on paper. | Policy doc; signed RACI |
| 3 | AI Risk Committee (CISO + Legal + Privacy + Eng) meets ≥monthly; agent risk tiers gate deployment; least-agency tier assigned per agent; documented Decision Rights for AI Agents matrix per agent type (action class × decision right × approver × justification × time bound — see concept page for the schema); shadow-agent inventory + reaper SLA defined (see Shadow Automation). | Meeting minutes; deployment-gate evidence; decision-rights matrix; shadow-agent reaper runbook |
| 4 | Quantitative risk metrics tracked (incidents, drift events, HITL escalations); board-level AI risk reporting; AIUC-1 readiness assessment complete; standards-crosswalk matrix maintained (CMM domain × Annex IV × AIUC-1 safeguard × ISO 42001 Annex A × NIST SP 800-53 via IR 8605A — see Agentic AI Security CMM — Standards Crosswalk Matrix); board metrics tagged with ASI## / AIVSS rollups. | Dashboard; board pack; gap report; crosswalk matrix |
| 5 | AIUC-1 certified against the most recent quarterly refresh (or IEC 42001 certified) AND retained; quantitative risk metrics published internally with board attestation; AI Risk Committee operating with documented decision history ≥1 year; standards-crosswalk matrix audited and refreshed each quarterly cycle. Note: AIUC-1 is a moving target — L5 is “current at last quarterly refresh,” not “ever certified.” | Most-recent cert dated within last quarter; board-attested risk metrics; committee minutes ≥1 year; refreshed crosswalk |
| 5+ | All L5, plus: active named contributor to one or more of CoSAI / OWASP ASI / AIVSS / NIST CAISI / OASIS / Linux Foundation AAIF (PR, RFC, working-group authorship — not membership only); published peer-reviewed or vendor-disclosed research with empirical agentic-AI security findings; org-level AI risk-observability metrics published externally (e.g. CSAI Foundation AI Risk Observatory contribution). | Named-contributor evidence (commit / PR / spec authorship); published research artifact; external observability dataset |
D2. Identity & Authorization
Per-agent (non-human) identity assignment and the credential lifecycle — issuance, scoping, rotation, revocation — that enables zero-credentials-in-agent-context operation, with explicit treatment of coupled-credential workflows where credential and identity cannot be separated.
Maps to: OWASP ASI03, NIST CAISI Concept Paper (Feb 2026), ISO 27090 (FDIS Mar 2026), Microsoft ZT4AI Pillar 1.
| Level | Criteria | Evidence |
|---|---|---|
| 1 | Agents share human credentials or service accounts. No agent inventory. | none |
| 2 | Agents have distinct service-account identities; agent inventory in spreadsheet. | Inventory artifact |
| 3 | Every agent has a verifiable identity (Okta for AI Agents / Entra Agent ID / SPIFFE workload ID); action-to-human tracing in audit log; OAuth 2.1 token exchange for delegation; NHI lifecycle bound to code-deploy pipeline, not to HR joiner/mover/leaver (per What Are Non-Human Identities? (Oasis Security)); inventory distinguishes coupled vs decoupled NHIs (see Identity-Credential Coupling); every NHI has a mandatory human owner field. | Identity graph; sample audit trail; CI/CD-registered NHI list; owner field coverage report |
| 4 | Credential proxy enforces zero-credentials-in-context (AgentKeys / Keychains.dev / Aegis); per-agent policies in Cedar/OPA; orphaned-agent kill switch tested; automated rotation cadence per credential class (short-lived JWT-class auto-rotates; coupled credentials per Identity-Credential Coupling rotate per documented dependency map — see D9 L4); behavioral baseline per NHI (per-credential activity pattern); migration plan to replace coupled credentials (SAS tokens, storage access keys) with decoupled alternatives (managed identities, role-based access). | Cred-proxy logs; policy repo; tabletop; rotation cadence report; coupled-credential migration plan |
| 5 | Unified agent governance program operating in production (capability set: registry + lifecycle API + per-agent identity graph + ownership transfer + scoped RBAC + audit log integration; reference deployments include Microsoft Agent 365 GA May 1 2026, Okta for AI Agents GA Apr 30 2026, Anthropic Compliance API Mar 2026); shadow-AI discovery operational; cryptographic attestation of identity binding (SPIFFE-JWT-SVID or equivalent shipping today); zero coupled credentials remaining for agent-class NHIs (full migration off SAS / storage access keys / Snowflake-class API keys per Identity-Credential Coupling). Evidence MUST tag any in-flight identity-related findings with ASI03 and applicable AML.T#### IDs. | Registry export; ISPM dashboard; attestation chain; coupled-credential migration report; ASI03-tagged finding log |
| 5+ | All L5, plus: NIST CAISI demonstrator alignment as a named participant (concept paper Feb 2026; demonstrator profile contributed); multi-vendor agent identity federation across two or more IDaaS platforms (Entra + Okta + Keycard) with cross-platform identity-graph reconciliation; contribution to SPIFFE / OAuth 2.1 / OIDC working groups on agent-specific extensions. | NIST CAISI participation evidence; cross-platform reconciliation report; standards-WG contribution evidence |
D3. Control & Least-Agency
Authorization of agent actions — scope, timing, human-in-the-loop coverage, and segregation of duties — adjudicated at a Policy Decision Point external to the model context, with progressive-autonomy promotion gates and time-bounded elevation.
Maps to: OWASP ASI09, Least Agency Principle, AWS Agentic AI Security Scoping Matrix (anchor for the agency-vs-autonomy distinction used throughout this domain), CSA Agentic Trust Framework progressive autonomy gates, CoSAI risk-based governance.
| Level | Criteria | Evidence |
|---|---|---|
| 1 | No tool-call policies; agents may call any tool. | none |
| 2 | Tool allowlist per agent; HITL for “destructive” actions defined informally. | Allowlist config |
| 3 | OWASP four-tier least-agency model implemented (auto / notify / confirm / block); Cedar or OPA Policy Decision Point inline before every tool call; risk-tier per action documented. | PDP config; tier assignments |
| 4 | Four-stage progressive autonomy promotion model implemented (CSA Agentic Trust Framework v0.9.1: Intern → Junior → Senior → Principal — Observe+Report → Recommend+Approve → Act+Notify → Autonomous — with documented promotion criteria per stage including minimum time at level, accuracy thresholds, availability targets, named security validations, and sign-off matrix; org-authored technical-prerequisite rubric for the Principal-tier hardware-bound identity / policy-as-code primitives that ATF leaves abstract — see CSA Agentic Trust Framework); HITL coverage measured per agent and per action class; lethal-trifecta breaker active (auto-detect private-data + untrusted-content + external-comms combinations and downgrade tier); time-bounded elevation — any decision-right above the agent’s baseline tier (e.g. promotion from auto to auto-with-write) is JIT, scoped to a maintenance window or a single approval, and auto-reverts at expiry; segregation of duties enforced (the agent proposing a change is not the agent approving or deploying it). Evidence MUST tag findings with ASI09 and any AIVSS amplification factor scoring (Autonomy Level, Tool Use Scope). | Promotion-gate runbook (org-authored); HITL telemetry; trifecta-detection log; JIT-elevation expiry log; SoD policy |
| 5 | Capability tokens / Warrants per task with cryptographic binding (Tenuo Warrant or equivalent shipping today); risk-adaptive step-up — anomaly score from D7 behavioral monitoring auto-downgrades the agent’s autonomy tier per the D3 L4 promotion model; deny-by-default Cedar/OPA policy compiled and reviewed every release (no policy drift); SoD enforced cryptographically (the deploying agent’s signing key cannot also approve). Evidence MUST tag findings with ASI09 and AIVSS Autonomy Level / Tool Use Scope amplification factors. | Warrant samples; step-up logs; policy-compile artifact per release; cryptographic SoD evidence |
| 5+ | All L5, plus: CaMeL-style privileged/quarantined LLM split for lethal-trifecta-positive workloads in production (Google DeepMind 2025 research; no shipping vendor as of May 2026); formal verification of policy contradictions / vacuity / shadow subsets via Cedar Lean compiler over MCP (per Sondera harness approach extended to credential / tool-call PDPs); temporal-logic policy for trajectory-aware constraints (open research area — Cedar’s statelessness limitation). | CaMeL pilot charter + production deployment evidence; formal-verification reports; temporal-logic policy artifact |
D4. Runtime & Guardrails
Runtime defenses against prompt injection, jailbreak, grounding failure, and output-safety violations — instrumented with per-guardrail latency and cost budgets and fail-closed enforcement on critical paths.
Maps to: OWASP ASI01, ASI02; MITRE ATLAS AML.T0051 (LLM Prompt Injection — incl. .000 Direct / .001 Indirect / .002 Triggered) and AML.T0054 (LLM Jailbreak); CoSAI Maximize Oversight; Microsoft Prompt Shields; Model-Layer Attacks (output-randomization and query-pattern-monitoring controls applicable at L4); Agent Availability Threats (runtime step / token / recursion budgets at L3+).
| Level | Criteria | Evidence |
|---|---|---|
| 1 | No guardrails or system-prompt-only “guardrails.” | none |
| 2 | Vendor prompt filter (e.g., model provider’s default safety filter); some content-safety classifier on output. | Provider config |
| 3 | Platform-level lifecycle hooks (Google ADK, Anthropic hooks, or equivalent); LlamaFirewall PromptGuard 2 or NVIDIA NeMo Jailbreak Detection NIM in path; per-task sandbox for high-risk-tier actions. | Hook code; firewall logs; sandbox config |
| 4 | LlamaFirewall AlignmentCheck (or equivalent CoT auditing) on agentic workloads; CodeShield on code-gen agents; output classifiers for hallucination/grounding (Azure Groundedness Detection); injection-resistant context boundaries (system prompt architecture markers). | AlignmentCheck logs; CodeShield findings; grounding scores |
| 5 | All L4 controls enforced platform-level across every agent surface (no opt-out for “internal-only” or “low-risk” agents); multi-language injection coverage measured against current bypass-class library (LlamaFirewall PromptGuard 2 multi-language + Lakera Guard or NeMo Jailbreak NIM, all shipping); output classifiers continuously updated against latest jailbreak corpora (vendor-supplied weekly refresh); response-leak scanning at egress (e.g. AgentCordon-style outbound credential exposure check) catches credentials echoed in API responses before they reach the agent; per-agent guardrail latency / cost budgets enforced with fail-closed on critical paths. Evidence MUST cite specific AIVSS-scored vulnerabilities and AML.T#### techniques covered. | Platform-enforcement coverage report (zero opt-outs); multi-language eval log; classifier refresh receipts; response-leak alert log; latency/cost dashboard with fail-closed proof |
| 5+ | All L5, plus: (a) cryptographic attestation that guardrails executed in a TEE (AWS Nitro Enclaves-class; reference: Miggo Security pilots, no GA shipping product as of May 2026); (b) CaMeL-style privileged/quarantined LLM split in production for trifecta-positive workloads (Google DeepMind research; no shipping product); (c) measurable bypass-class evidence covering post-Trendyol leetspeak / non-English / Unicode-confusable bypasses with vendor-acknowledged remediation cycles. | TEE attestation logs; CaMeL production deployment evidence; bypass-class eval results with remediation timeline |
D5. Egress & Network
Network-layer mediation of agent egress — agent-aware gateway enforcement of MCP, A2A, and LLM protocols, per-task egress capability tokens bound to upstream resources, and SSRF closure at the network layer to ensure the gateway is the only path.
Maps to: OWASP ASI02, ASI07; CoSAI MCP Security white paper (Jan 2026, 12 categories / 40 threats); CSA MAESTRO Layer 4–5.
| Level | Criteria | Evidence |
|---|---|---|
| 1 | Agents have unrestricted network egress. | none |
| 2 | Outbound allowlist per agent (DNS or proxy-level). | Proxy config |
| 3 | An agent-aware proxy / gateway between agent and external tools enforcing per-tool RBAC (AgentGateway in Linux Foundation, Solo Enterprise, Cloudflare AI Gateway, Kong AI Gateway, or equivalent); HTTPS / TLS 1.3 + OAuth/mTLS for inter-agent A2A v1.0 communication per spec §7 (note: §8.4 Agent Card signing is algorithm-agnostic — orgs MUST document their own A2A enforcement profile, including signing algorithm and replay-protection layering); tool fingerprinting active. Evidence MUST tag MCP-server findings with applicable MCP CVEs Q1 2026-class CVE IDs. | Gateway config; certs; A2A enforcement profile; CVE-tagged finding log |
| 4 | OAuth 2.1 token exchange per tool call (CoSAI / NIST CAISI pattern); rug-pull and tool-poisoning detection; A2A content scanning (Oktsec 268-rule equivalent or comparable); MCP CVE feed integration. | Token-exchange logs; rule sets |
| 5 | Mesh-deployed AgentGateway sidecar (or AgentCordon-style combined gateway+vault) per agent with no agent egressing without a sidecar; per-task egress capability tokens (Tenuo Warrant or equivalent) bound to the specific upstream resource; SSRF / direct-egress paths closed at network layer (Smokescreen or equivalent) so the gateway is the only path; A2A signing enforcement profile published and audited per release; MCP CVE feed wired to auto-quarantine (no human-in-the-loop required for known-bad servers). Evidence MUST tag MCP-server findings with applicable MCP CVEs Q1 2026-class CVE IDs. | Mesh topology with zero-bypass proof; per-task token samples; SSRF closure verification; A2A profile audit; CVE-feed auto-quarantine log |
| 5+ | All L5, plus: sigstore-for-MCP cross-tenant signing (proposal stage as of May 2026 — no shipping verifier); behavioral A2A drift detection (research-stage; SentinelAgent / TraceAegis / Bi-Level GAD are papers, not products); cross-cloud egress federation with reconciliation across two or more agent-aware proxies. | Sigstore-for-MCP verifier deployment; A2A drift rule library; cross-cloud reconciliation report |
D6. Data, Memory & RAG
Trust attribution and integrity controls for all content the agent ingests, retrieves, or persists — including the agent’s own system prompts and identity files (cognitive file integrity), retrieval corpora, and per-session memory.
Maps to: OWASP ASI05, ASI06; MITRE ATLAS AML.T0020 (Poison Training Data), AML.T0024 (Exfiltration via ML Inference API), AML.T0044 (ML Model Inference API Access), AML.T0070 (RAG Poisoning), and AML.T0080 (AI Agent Context Poisoning — incl. .000 Memory / .001 Thread); CoSAI MCP server data threats; PoisonedRAG / ConfusedPilot literature; Differential Privacy (DP-SGD as L4/L5 control for training-data privacy); Model-Layer Attacks (extraction / inversion / membership-inference defenses).
| Level | Criteria | Evidence |
|---|---|---|
| 1 | RAG corpus has no provenance; agent memory has no integrity controls. | none |
| 2 | Document-source labels on RAG retrievals; manual review of skill/plugin source code. | Labeling sample |
| 3 | Per-source trust attribution (system prompt architecture markers); RAG-injection scanning on retrieved content; data-poisoning scanning at ingest; cognitive file integrity (SHA-256) on agent identity files (SOUL.md / IDENTITY.md / system prompts) — extends the AIUC-1 B008.6 model-artifact-integrity primitive (cryptographic checksums for tamper detection on model artifacts) to the prompt / identity-file surface, which AIUC-1 does not name. | Scan results; CFI baseline |
| 4 | RAGShield / TrustRAG-class document attestation; memory-poisoning detector (Microsoft M365 detector or equivalent); PoisonedRAG defense (DRS or sentinel-strategist arch); state rollback (Brain Git) tested. | Attestation logs; rollback drill |
| 5 | Real-time corpus drift detection (RAGShield / TrustRAG-class extended to streaming ingest); documented poisoning-rate bound based on domain-appropriate empirical evidence (the corpus owner sets the threshold and cites the supporting study; the Nature Medicine 2024 0.001% medical-imaging finding is one example, not a general bound); cross-source contradiction detection via per-document trust scoring + retrieval-time conflict flagging (shipping in TrustRAG / sentinel-strategist arch); system-prompt confidentiality controls in production (anti-leakage monitoring per OWASP LLM07:2025, canary tokens active across all agent system prompts, leak-detection alerting wired to SIEM); state rollback (Brain Git or transactional memory) tested ≥quarterly with measured RTO. Evidence MUST tag findings with ASI06, LLM01:2025 / LLM07:2025, and applicable AML.T0080 (AI Agent Context Poisoning: Memory). | Drift dashboard; threshold-justification memo; conflict-flagging logs; canary-token deployment log; rollback drill report with RTO |
| 5+ | All L5, plus: cryptographically attested document attestation at ingest (per-document signature + hash chain; no shipping product as of May 2026 — closest reference is sigstore for software artifacts, not yet adapted to RAG corpora); formal taint lattice for cross-source contradiction (research-stage; theoretical foundation in IFC literature, no production system as of May 2026); zero-knowledge proofs for sensitive RAG retrievals. | Per-doc attestation chain; taint-lattice implementation evidence; ZK-proof verifier logs |
D7. Observability & Detection
Telemetry, detection, and continuous evaluation of running agents — emission under OpenTelemetry gen_ai.* semantic conventions, behavioral-drift and AI-SPM monitoring, multi-tool red-team evaluation across distinct attack categories, and analyst-actionable alerting wired to closed-loop controls updates.
Maps to: NIST CSF 2.0 Detect, MITRE ATLAS detection layer, Agent Observability, OWASP ASI08 / ASI10, Agent Availability Threats (anomaly detection for runaway / recursive / resource-exhausting patterns).
Tension with the Stripe/Bullen architectural-containment view (mostly resolved)
[[breaking-the-lethal-trifecta-bullen-talk|Andrew Bullen’s [un]prompted talk]] presents a production agent platform with no D7-style behavioral observability layer at all — Stripe’s defense is architectural containment (Smokescreen + agent-tag CI + Toolshed +
ToolAnnotations+ HITL on sensitive writes). In Q&A Bullen explicitly says detective controls “have a place, especially for customer-facing products” but Stripe leans on “more deterministic, architectural controls.” Implication for the CMM: a sophisticated practitioner with strong D3/D4/D5 may legitimately score lower on D7 and still have a sound program. The L4 row below requires behavioral monitoring + AI-SPM + quarterly multi-tool red-team — a Stripe-tier architecture would meet the CMM’s safety bar without all of those, and forcing them would be controls-for-controls’-sake.Resolution (2026-05-04 revision): the new effective-score aggregation now reports D7 raw + the strategic-rationale field rather than dragging the headline rating down to D7’s level. Stripe’s matrix reads “L4 typical / L2 D7 (intentional trade-off — D3+D5 architectural containment)” instead of “L1 overall.” A future candidate rule (DR-C002 in the dependency-rules registry) considers whether D5 strength can raise the D7 ceiling for architectural-containment archetypes; that’s a v2+ design decision (negative-rules / floor-relaxation) parked as an open question on the dependency-rules page.
| Level | Criteria | Evidence |
|---|---|---|
| 1 | No agent-specific logs; vendor console only. | none |
| 2 | Tool-call audit log; agent action history with user attribution. | Sample log |
| 3 | OpenTelemetry gen_ai.* semantic conventions emitted across agents (model, request, tool, retrieval, agent spans); LangSmith / Langtrace / Traceloop or equivalent in path; identity multiplexing in logs; minimum action-log schema per tool call: {timestamp, agent_id, user_id, action_type, resource_path, approval_status, rollback_ref} — every agent action must populate every field, and rollback_ref must point to a recoverable state (Brain Git commit, transaction ID, or documented “irreversible” classification with prior approval). | Trace samples; span schema validation; action-log schema conformance check |
| 4 | Agent behavioral monitoring (Vectra / Miggo / SecureClaw nightly audits) operational; behavioral drift alerts wired to SIEM/SOAR; AI-SPM (Wiz / Prisma AIRS / Orca) deployed; quarterly red-team eval covering distinct attack categories — orchestration / multi-turn (PyRIT), probe library (Garak), regression suite (Promptfoo), and continuous CART (Mindgard CART or equivalent). Single-tool coverage is not L4. Eval results MUST be tagged with AML.T#### and AIVSS scores; behavioral-monitoring alerts MUST tag ASI08 / ASI10 per detection rule. | Behavioral-monitoring dashboards; multi-tool eval reports with ID tags |
| 5 | Real-time AI-BOM (Miggo DeepTracing or equivalent shipping ADR product); agent-aware SIEM playbooks deployed in production (Falcon AIDR + NeMo Guardrails OR Sentinel + Defender for Cloud Apps OR equivalent shipping vendor pair); per-agent behavioral baselines with documented prompt-volume-to-alert ratios (per Salesforce Rittinghouse case study: baseline ratio established and held over ≥1 quarter); high-fidelity alerting with measured analyst-actionable rate ≥80%; closed-loop wiring of every triggered alert back to a controls update within defined SLA. Evidence MUST tag detections with ASI08 / ASI10 per detection rule and behavioral findings with applicable AIVSS amplification scores. | DeepTracing graph; agent-aware playbook samples; prompt-volume-to-alert dashboard ≥1 quarter; analyst-actionable rate report; SLA-bounded controls-update log |
| 5+ | All L5, plus: cascade-detection rule library with tuned thresholds for ASI07/08/10 multi-agent risk (research-stage as of May 2026 — SentinelAgent, TraceAegis, Bi-Level GAD published; no production rule library shipping); cross-agent behavioral monitoring with statistical multi-agent baselines (joint-distribution baselines, not just per-agent baselines); model forward-pass activation monitoring per Glass-Box Security (Starseer pre-launch, no shipping product). | Cascade rule registry with thresholds; multi-agent joint-baseline statistics; forward-pass activation monitor evidence |
D8. Supply Chain & AI-BOM
Provenance, integrity, and disclosure for the model, skill, dependency, and tool artifacts that compose an agent’s runtime — addressed through build-time and runtime AI-BOM, signed releases, registry and pre-install scanning, ML-VEX disclosure, and SLSA-graded provenance.
Maps to: OWASP ASI04, NIST SP 800-218A (SSDF AI Profile), EU AI Act Art. 11/Annex IV, CycloneDX 1.6 ML-BOM, SPDX 3.0.
| Level | Criteria | Evidence |
|---|---|---|
| 1 | No model or skill provenance tracking. | none |
| 2 | Model and library versions tracked manually; vendor-attested model cards collected. | Inventory |
| 3 | OWASP AIBOM Generator (CycloneDX) or SPDX 3.0 AI extension produces AI-BOM at build; signed releases; registry-scan + pre-install scan for skills/MCP servers (Aguara Watch / SecureClaw). | AI-BOM artifact; sigstore log |
| 4 | sigstore/cosign signing of every model and skill artifact; ML-VEX equivalent for vulnerability disclosure; runtime AI-BOM (Miggo DeepTracing or equivalent) reconciled with build-time AI-BOM; cognitive file integrity baselines on every IDENTITY.md / SOUL.md. Findings tagged with ASI04, applicable CVEs (e.g., MCP CVEs Q1 2026-class), and ATLAS AML.T#### supply-chain techniques (incl. “Publish Poisoned AI Agent Tool” added in v5.4.0). | Sig-verified registry; reconciliation report; ID-tagged ML-VEX feed |
| 5 | Closed-loop cred-proxy → AI-BOM → AI-SPM in production: every credential vended is reconciled against the AI-BOM, every AI-BOM artifact is reconciled against AI-SPM posture findings, every AI-SPM finding produces a controls update within SLA; SLSA Level 3 for AI artifacts (signed provenance, hermetic builds, isolated build environments — achievable today with sigstore/cosign + GitHub Actions / Buildkite hardened runners); runtime AI-BOM (Miggo DeepTracing) reconciled against build-time AI-BOM with zero-tolerance drift policy; ML-VEX feed published for own AI components. Findings tagged with ASI04, applicable CVEs, and ATLAS AML.T#### supply-chain techniques. | Closed-loop diagram with SLA evidence; SLSA L3 attestation; reconciliation report; ML-VEX feed |
| 5+ | All L5, plus: SLSA Level 4 for AI artifacts (two-party-reviewed builds + hermetic + reproducible — challenging for stochastic model artifacts, partial spec coverage); cross-vendor AI-BOM federation (reconciliation across two or more AI-BOM-emitting platforms; aspirational as of May 2026); active named contributor to OWASP AIBOM Generator / CycloneDX 1.6 ML-BOM / SPDX 3.0 AI extensions WG (PR / spec authorship). | SLSA L4 report; cross-vendor reconciliation; named-contributor evidence |
D9. Operations & Human Factors
Cross-cutting operational and human-factor controls that no single AI security standard mandates as a coherent set: HITL-fatigue monitoring, decommission and rotation lifecycle, latency / cost discipline, system-prompt confidentiality, federated incident sharing, and model deprecation policy.
Maps to: NIST AI 800-4 post-deployment monitoring (human factors flagged as biggest blind spot); EU AI Act Art. 12 logging, Art. 14 human oversight; OWASP LLM07:2025 System Prompt Leakage; CoSAI AI Incident Response Framework v1.0; CSA ATF gates for sustained autonomy.
Why this domain exists. The validation page (Validation: Agentic AI Security CMM vs Widely Adopted Standards §3) surfaced seven operational gaps that no published standard demands but a credible agentic-AI program must operate: guardrail latency / cost budgets, non-adversarial drift, agent decommission lifecycle, human-factors monitoring, federated incident sharing, model deprecation, system-prompt confidentiality. D9 packages these into one cross-cutting domain so they can be measured and improved independently of the per-plane domains.
| Level | Criteria | Evidence |
|---|---|---|
| 1 | No documented operational SLAs for guardrails, no decommission procedure, no HITL-fatigue tracking, no system-prompt confidentiality controls. | none |
| 2 | Documented runbook for: (a) guardrail timeout / fail behavior, (b) agent decommission and credential rotation when a human owner leaves, (c) HITL queue monitoring, (d) basic system-prompt protection (system messages not echoed to user). | Runbook artifact |
| 3 | Measured guardrail latency and cost budgets per agent (p50/p95/p99); fail-mode is explicit and tested (fail-closed for high-risk tier, fail-open documented for read-only); orphaned-agent reaper runs on schedule with measurable SLA; HITL approval-rate and queue-age tracked; system-prompt extraction defenses (canary tokens deployed; LLM07:2025 test cases in red-team suite); model deprecation policy published; incident-disclosure participation in at least one community (e.g. CoSAI IR, OWASP LLM Top 10 contributors, MITRE ATLAS contributors). | Latency/cost dashboard; reaper logs; HITL telemetry; canary deployment proof; community participation evidence |
| 4 | Quantitative HITL-fatigue indicators (rubber-stamp rate, false-positive burnout metric); benign behavioral drift detection separate from adversarial detection (per NIST AI 800-4 categories); decommission drills run quarterly (tabletop or live); model version pinning policy in production; AI-VEX equivalent published for own AI components; coordinated-disclosure participation in MCP CVEs Q1 2026-style cross-org exercises; per-credential dependency map maintained (which consumers depend on which NHI credential — required before automated rotation per What Are Non-Human Identities? (Oasis Security)); rotation runbook tested for coupled-credential cases per Identity-Credential Coupling. Evidence MUST tag findings with LLM07:2025 and any AIVSS amplification factors that apply to drift / autonomy. | HITL-fatigue KPIs; benign-drift dashboard; drill report; AI-VEX feed; cross-org incident report; dependency map; rotation runbook test report |
| 5 | Closed-loop continuous improvement in production: every guardrail / decommission / HITL incident produces a measurable controls update within a defined SLA (org-published, e.g. P1 ≤ 24 hours, P2 ≤ 1 week); demonstrated zero-credentials-orphaned, zero-prompt-leak, and zero-undeprecated-model in production for ≥2 quarters with attestations; deputy + runbook continuity test completed each quarter (anti-pattern I3 recovery); HITL-fatigue indicators within published thresholds (rubber-stamp rate, queue p95 within target). | SLA-bounded controls-update log; clean-state attestations; quarterly continuity-test report; HITL-fatigue dashboard within thresholds |
| 5+ | All L5, plus: published organization-level AI risk-observability metrics externally (e.g. CSAI Foundation AI Risk Observatory contribution); active named contribution of drift-detection patterns / bypass classes back to standards bodies (CoSAI IR Framework, OWASP LLM Top 10, MITRE ATLAS); cross-org coordinated-disclosure leadership in MCP CVEs Q1 2026-class exercises. | External observability dataset; named contribution evidence; coordinated-disclosure leadership artifacts |
Mapping to deployment shapes
A small organization with one chatbot will not pursue Level 5 across all 9 domains, and almost no organization will pursue L5+ across all 9 domains — L5+ is intentionally bleeding-edge and unachievable without category-creation work. The CMM is meant to be applied per agent application, not enterprise-wide. The default expectation for a sufficiently resourced 2026 program is L4 across all domains with selective L5 where deployment exposure justifies it. L5+ ambitions are appropriate for frontier labs, hyperscalers’ own platforms, and dedicated AI-security research shops.
| Application | Realistic target (most enterprises) | Domains where Level 5 is justified |
|---|---|---|
| Web/desktop chatbot (no tools) | L3 across all | D4 (if processing high-stakes content), D9 (system-prompt confidentiality) |
| Generative coding tool (Cursor / Copilot / Claude Code class) | L4 across all | D2, D4, D8 (supply chain risk on skill/MCP poisoning is high), D9 (decommission cadence and prompt leakage). Coding-agent specific evidence at L3+ (per AI Coding Agent Governance (Knostic, 2025–2026)): (a) agent rules-file integrity — Cursor .cursorrules / Copilot Workspace rules / Claude IDENTITY.md baseline + drift detection (extends cognitive file integrity to rules files);(b) IDE extension provenance — extension allowlist + sigstore-equivalent verification; (c) typosquat / dependency hijack defense at install time (Aguara Watch / Kirin / equivalent); (d) destructive-action classification (force-push, branch deletion, mass refactor, prod-config write) auto-routes to confirm or block tier per the Decision Rights for AI Agents matrix. |
| Data-science copilot | L3 → L4 | D2 (data scope), D6 (data integrity), D9 (operational drift in long-running notebooks) |
| RAG application | L4 in D6 + D7 | D6 (PoisonedRAG / ConfusedPilot exposure), D9 (model deprecation and embedding versioning) |
| MCP server (provider) | L4 in D5 + D8 | D8 (consumed by many; signing is critical), D9 (federated CVE disclosure) |
| Agent skill (publisher) | L4 across all | D8, D2, D9 (skill deprecation policy) |
| Multi-agent mesh | L4 minimum | D5, D7 (cascade / rogue-agent detection), D9 (HITL-fatigue at scale) |
Tooling map per domain (May 2026 snapshot)
Three categories: Standards / Specs = formally governed specifications, frameworks, or guidance documents (IETF, CNCF, OWASP, NIST, CSA, etc.); OSS tools = open-source software with an Apache / MIT / similar license; COTS / SaaS = commercial off-the-shelf or managed cloud service. A single capability may appear in multiple categories when standards define the protocol and both OSS and commercial implementations exist. See Agentic AI Security Reference Architecture (2026) §Recommended stacks for opinionated per-profile selections.
| Domain | Standards / Specs | OSS tools | COTS / SaaS |
|---|---|---|---|
| D1 Governance | OWASP ASI Top 10, NIST AI RMF, ISO 42001, EU AI Act, AIUC-1 six pillars, CoSAI Principles | OWASP ASI Top 10 templates, AIUC-1 self-assessment checklists | KPMG / Schellman audits, RSAC governance services |
| D2 Identity | SPIFFE (CNCF standard); OAuth 2.1 (IETF RFC 9700); OIDC (OpenID Foundation); NIST CAISI Concept Paper (Feb 2026) | SPIRE (CNCF OSS); AgentKeys; Keychains.dev; Aegis; OneCLI; AgentSecrets | Okta for AI Agents (GA Apr 30 2026); Microsoft Entra Agent ID; Microsoft Agent 365 (GA May 1 2026); Aembit; Astrix; CyberArk Conjur |
| D3 Control & Least-Agency | OWASP four-tier least-agency model; CSA Agentic Trust Framework (Feb 2026) 5-gate model | Rego (CNCF OSS); Cedar (Apache 2.0, AWS); Tenuo Warrants (OSS) | AWS Cedar managed (Mar 2026 AI release); Anthropic Compliance API; Permit.io; Topaz |
| D4 Runtime & Guardrails | — | LlamaFirewall (Meta — PromptGuard 2, AlignmentCheck, CodeShield); NeMo Guardrails (NVIDIA OSS); Guardrails AI; Microsoft Agent Governance Toolkit (Apr 2026) | Lakera Guard; Lasso; HiddenLayer; Microsoft Prompt Shields; NeMo NIMs (commercial); Robust Intelligence |
| D5 Egress & Network | A2A v1.0 spec (Linux Foundation); CoSAI MCP Security white paper (Jan 2026) | AgentGateway (Linux Foundation, Apache 2.0); Oktsec; mTLS via Istio or Linkerd (both CNCF OSS) | Solo Enterprise for AgentGateway; Operant MCP Gateway; Natoma; Cloudflare AI Gateway; Kong AI Gateway |
| D6 Data, Memory & RAG | CycloneDX 1.6 ML-BOM (OWASP); SPDX 3.0 AI extensions (Linux Foundation) | RAGShield; TrustRAG; OWASP AIBOM Generator; sigstore / cosign; Brain Git (SlowMist); SecureClaw; LangChain PII Middleware | Microsoft Purview AI; ReversingLabs ML scan; JFrog ML scan; Protect AI; Cohere Embed |
| D7 Observability & Detection | OTel gen_ai.* SemConv v1.37+ (CNCF); MITRE ATLAS detection layer | Langtrace; Traceloop; Helicone; Promptfoo; PyRIT (Microsoft OSS); Garak (NVIDIA OSS) | LangSmith; Wiz AI-SPM; Palo Alto Prisma AIRS; Orca AI-SPM; Reco; Mindgard CART; Vectra AI; Miggo Security |
| D8 Supply Chain & AI-BOM | CycloneDX 1.6 ML-BOM; SPDX 3.0 AI ext; NIST SP 800-218A SSDF AI Profile; EU AI Act Art. 11 / Annex IV | OWASP AIBOM Generator; sigstore / cosign; Aguara Watch; SecureClaw 55-check audit | Anchore; Snyk AI; JFrog AI Catalog; ReversingLabs; IBM Granite disclosures; Lineaje |
| D9 Operations & Human Factors | NIST AI 800-4 monitoring categories; OWASP LLM07:2025 test cases; CoSAI IR Framework v1.0; MITRE ATLAS coordinated-disclosure templates | OTel latency / cost spans; canary-token tooling | DataDog AI Monitoring; New Relic AI Monitoring; Sentry AI Tracing; AI-VEX disclosure platforms (emerging); Schellman / Coalfire AI risk-observability services |
Practitioners worth following
These individuals and organizations have shipped substantive work on the controls in this CMM. Cited where their output was directly used in this synthesis.
| Person / org | Contribution | Relevant page |
|---|---|---|
| Simon Willison | Lethal Trifecta (Jun 2025); CaMeL coverage; structural test for prompt-injection vulnerability | Simon Willison |
| Johann Rehberger | Embrace The Red; Month of AI Bugs (Aug 2025); Jules AI kill chain | Johann Rehberger |
| Bill McIntyre | Securing Your Agents (2026, AIE / RMAIIG); 40-slide layered playbook | Bill McIntyre |
| Jason Clinton (Anthropic Deputy CISO) | AIVSS Distinguished Review Board; CISO’s Guide to Agentic AI webinar | (entity stub candidate) |
| Apostol Vassilev (NIST) | NIST AI 600-1 lead; CAISI early contributor | (entity stub candidate) |
| Ken Huang | OWASP AIVSS lead | (entity stub candidate) |
| Meta Purple Llama team | LlamaFirewall (PromptGuard 2 / AlignmentCheck / CodeShield) | LlamaFirewall |
| Solo.io / Linux Foundation AAIF | AgentGateway → LF (July 2025); Solo Enterprise distribution | AgentGateway |
| Microsoft Security Research | FIDES (zero successful PI on AgentDojo); ZT4AI; Agent 365; M365 memory-injection detector | Microsoft Responsible AI Standard (RAI) |
| Google DeepMind | CaMeL privileged/quarantined LLM split | |
| NIST NCCoE | CAISI AI Agent Standards Initiative; Concept Paper Feb 2026 | NIST — National Institute of Standards and Technology |
| CoSAI / OASIS | MCP Security white paper Jan 2026; Principles for Secure-by-Design Agentic Systems; Agentic Identity & Access Management Apr 2026 | CoSAI — Coalition for Secure AI |
| OWASP Gen AI Project | ASI Top 10; AIVSS v0.8; AIBOM Generator; Practical Guide for Secure MCP | OWASP — Open Worldwide Application Security Project |
| CSA | MAESTRO threat model; Agentic Trust Framework with 5 promotion gates (Feb 2, 2026) | CSA — Cloud Security Alliance |
| AIUC | AIUC-1 standard; quarterly updates; Schellman accredited Feb 2026 | (entity stub candidate) |
Implementation roadmap
Phased over four phases (Foundation → Standardization → Measurement → Optimization) with an optional fifth phase for organizations targeting L5+.
| Phase | Months | Focus | Target by end of phase |
|---|---|---|---|
| 1. Foundation | 1–3 | Inventory + identity + operational baseline | D1 L2, D2 L2, D8 L2, D9 L2 |
| 2. Standardization | 4–9 | Platform-level enforcement (the critical inflection) + system-prompt confidentiality | D2 L3, D3 L3, D4 L3, D5 L3, D7 L3, D9 L3 |
| 3. Measurement | 10–18 | Behavioral monitoring + red-team + AI-BOM + HITL fatigue + decommission drills | D6 L3+, D7 L4, D8 L4, D9 L4 |
| 4. Optimization | 18+ | AIUC-1 / ISO 42001 cert; ≥2-quarter L4 stability; closed-loop ops improvement; bus-factor ≥2 with continuity test | D1 L5, selective L5 in domains tied to deployment exposure, D9 L5 |
| 5. Leading Edge (optional) | 24+ | Research-stage primitives in production (TEE attestation, CaMeL split, cascade-detection); active named standards contribution; cross-vendor federation | L5+ in 2–4 selected domains aligned to org’s research / product portfolio |
The critical inflection in this roadmap is end of Phase 2 (month ~9): Level 3 across D2–D5 + D7 marks the boundary between platform-level enforcement and prompt-level reliance. Below that boundary, an organization remains structurally vulnerable to prompt injection per the Lethal Trifecta test.
Appendix: Ten Security Dimensions (complementary threat-surface view)
The CMM’s nine domains are organized by where to enforce controls (governance, identity, control plane, runtime, egress, data, observability, supply chain, ops). The ten dimensions below are organized by what to defend against. Both views are useful — one drives architecture, the other drives threat modeling. The mapping back into CMM domains shows where each anchor threat lands.
| # | Dimension | Anchor threat | CMM domain(s) |
|---|---|---|---|
| 1 | Adversarial resilience | Prompt injection, jailbreaking, multilingual / leetspeak bypass | D4 Runtime |
| 2 | Data integrity | Training, fine-tuning, RAG, MCP-tool-metadata poisoning | D6 Data + D8 Supply Chain |
| 3 | Model security | Extraction (API scraping, distillation, side-channel/TPUXtract) | D2 Identity + D5 Egress |
| 4 | Privacy protection | Membership inference, embedding inversion | D6 Data |
| 5 | Supply chain security | Hugging Face / npm / model-registry compromise; ClawHavoc-class | D8 Supply Chain |
| 6 | RAG and vector security | Corpus poisoning, ConfusedPilot, embedding leakage | D6 Data |
| 7 | Agentic AI governance | MCP, tool poisoning, memory poisoning, autonomy creep | D2 + D3 + D5 |
| 8 | Output safety | Content filtering, hallucination, misuse prevention | D4 Runtime + D9 Ops |
| 9 | Lifecycle management | Training env, deployment hardening, monitoring, retirement | D1 + D8 + D9 |
| 10 | AI incident response | IR for prompt injection / poisoning / agent containment | D7 + D9 |
Use this lens when reasoning about what kinds of AI threats a deployment is exposed to; use the CMM’s nine domains when deciding where in the stack to enforce the response. Folded in 2026-05-06 from an earlier first-proposal draft (April 2026) that was otherwise consolidated into this canonical CMM.
Appendix: What this CMM contributes beyond reviewed standards
Verified against eleven widely-adopted AI-security standards (NIST AI RMF / 600-1 / 800-4 / IR 8605A; ISO 42001 Annex A + 27090 + 42006; MITRE ATLAS v5.6.0; OWASP ASI / AIVSS / LLM Top 10; Google SAIF; CoSAI primaries; Microsoft RAI / ZT4AI; CSA MAESTRO + ATF; EU AI Act; AIUC-1) on 2026-05-06 via primary-source agent fetches. Verification was keyword-level evidence collection — see §4 for per-claim tags and primary-source citations, and the audit backlog in Standards Validation Methodology for the deeper clause-by-clause reviews still pending. Treat the items below as load-bearing-pending-deeper-audit.
- Cross-domain aggregation discipline (dependency-resolved effective scores). No reviewed AI security standard enforces cross-domain aggregation. CMMC 2.0 uses cumulative levels; the CMM imports the discipline but uses dependency-resolved effective scores (v1 = 3 caps: D2→D5, D2→D7, D3→D4) — capturing real cross-domain attack-path failures without punishing strategic trade-offs. Prevents the “L4 in governance, L1 in egress” cherry-picking that plagues self-assessments.
- Cognitive File Integrity scoped to system prompts and identity files. AIUC-1 B008.6 mandates cryptographic checksums for model-artifact tamper detection (the closest near-miss in any reviewed standard). The CMM’s
D6 L3+extends the same primitive to system prompts and identity files (SOUL.md/IDENTITY.md) — unnamed in any reviewed standard. (Limitation: the file-discovery layer is not yet standardized — see CMM Known Limitations §5.) - Credential proxy at
D2 L4as a hard line. “Zero credentials in agent context” with named tooling (AgentKeys / Keychains.dev / Aegis). CoSAI MCP Security recommends token exchange / “do not pass through OAuth tokens” as a principle; CoSAI Agentic IAM and Google SAIF discuss credential management at principle level — none gates credential proxy by maturity tier. - Lethal Trifecta as a structural test.
D3 L4“lethal-trifecta breaker active” makes Simon Willison’s structural argument (untrusted input + sensitive data access + external communication) auditable. Verbatim search across CoSAI / SAIF / AIUC-1 / CSA ATF — zero hits for “trifecta” or any structural naming. SAIF Focus on Agents describes the chain in prose under Rogue Actions framing without naming the pattern. - Real-time AI-BOM at
L5(Miggo DeepTracing or equivalent). CycloneDX 1.6 ML-BOM treatsmachine-learning-modelas a static build-time component with no runtime reconciliation fields. EU AI Act Annex IV item 9 requires documentation OF a post-market monitoring system (per Article 72), not runtime reconciliation between deployed system and AI-BOM. Only the CMM grades runtime reconciliation as a level criterion. - Multi-agent cascade detection at
L5+. MITRE ATLAS v5.6.0 cross-check: zero matches for “multi-agent / agent-to-agent / A2A / inter-agent / cascade / sub-agent” across the full canonical YAML. AML.T0108 “AI Agent” and AML.T0103 “Deploy AI Agent” treat the agent as a single Persona-actor, not as a member of an inter-agent graph. CSA MAESTRO has only partial coverage. The CMM names the gap and points at the rule-library shape that would close it (cascade-detection rule library is research-stage; lives at L5+ explicitly aspirational).
These six are the load-bearing positive contributions. For known limitations of the same CMM, see CMM Known Limitations (current state).
Open questions and gaps
Status update — 2026-04-30 (post-validation)
The validation page (Validation: Agentic AI Security CMM vs Widely Adopted Standards) ran an independent review of the CMM against 11 widely adopted standards and produced 5 prioritized recommendations. Status of each below.
Addressed via this revision
- No measurement protocol. Resolved by Measurement Protocol — three-stage assessment (pre-engagement / evidence collection / scoring) with per-domain interview script, artifact checklist, live-observation requirements, and per-level scoring rubric.
- No reciprocity with AIUC-1 or ISO 42001. Resolved by Standards Crosswalk Matrix — domain-by-standard anchor map covering NIST AI RMF + 600-1 + 800-4, ISO/IEC 42001 Annex A, MITRE ATLAS v5.4.0, OWASP ASI/AIVSS/LLM, Microsoft ZT4AI, CSA MAESTRO/ATF, EU AI Act (incl. Annex IV), AIUC-1 six pillars, CoSAI/SAIF, and NIST SP 800-53 control families via IR 8605A.
- L5 product-name dependencies softened. D2 L5, D4 L5, D6 L5, D3 L4, D5 L3, D1 L5 reframed as capability sets with reference deployments, with explicit handling of products shipping the same week as the CMM and AIUC-1’s quarterly drift.
- 9th cross-cutting domain added. D9 Operations & Human Factors covers HITL fatigue, decommission lifecycle, latency/cost budgets, system-prompt confidentiality (
OWASP LLM07:2025), federated incident sharing, model deprecation policy.- ID-tagged evidence as a global rule. All findings at L3+ MUST be tagged with
ASI##/ AIVSS /AML.T####/ CVE /LLM##:2025. Levels updated throughout to reflect this.- L4→L5 calibration (added 2026-05-04). Resolved per stress-test 2026-05-02 §Change 1 + §Change 5: every L5 row rewritten to contain only currently-shippable controls; research-stage capabilities (TEE attestation, CaMeL split, cascade-detection libraries, cross-vendor federation, sigstore-for-MCP, formal taint lattice, named standards contribution) sequestered to a new L5+ Leading Edge tier; L4→L5 prerequisite gate added (≥2 quarters stable L4 + AIUC-1 readiness scheduled + bus-factor ≥2 + continuity test). L5 is now a bar a sufficiently resourced 2026 program can clear with shipping technology. L5+ remains aspirational.
- Cumulative-floor rule replaced (added 2026-05-04). Per stress-test §Change 2 + §Change 4 adoption: the single floor-across-9-domains rule replaced with dependency-resolved effective scores documented in dependency-rules scaffolding page. v1 active rule set has 3 conservative caps (D2→D5, D2→D7, D3→D4); 6 candidate rules parked in registry for evidence-based promotion; promotion criteria + revision protocol documented. Stripe-style architectural-containment, Microsoft Agent 365-driven, and resource-constrained-startup archetypes now report fairly (typical/weakest/strongest summary) instead of being collapsed to a single misleading floor.
Remaining gaps
- Agent-archetype tailoring — partially addressed. The generative coding tool archetype now has specific evidence (rules-file integrity, IDE extension provenance, typosquat defense, destructive-action classification) per the AI Coding Agent Governance (Knostic, 2025–2026) ingest. Still TBD: data-science copilot, customer-support chatbot, multi-agent mesh, MCP-server-as-provider archetypes.
- Multi-agent governance depth. D5 + D7 + D9 acknowledge ASI07/08/10. As of the 2026-05-04 calibration, the cascade-detection rule library now lives explicitly at L5+ rather than being an under-specified L5 requirement — “how many agents in your mesh, with what cascade-detection coverage” is the open quantitative question for L5+ adoption rather than a qualitative L5 gap.
- AIUC-1 Society pillar. The CMM has no analogue for catastrophic-misuse / national-security externalities. Acknowledged in Agentic AI Security CMM — Standards Crosswalk Matrix.
- Quantitative thresholds at L4. “Quantitative HITL-fatigue indicators” lacks specific thresholds (rubber-stamp rate < X%, queue age p95 < Y minutes) — TBD pending early-adopter production data.
- Synthetic incident library. Stage 2 of the measurement protocol calls for synthetic incidents (PoisonedRAG corpus injection, ClawHavoc-class skill swap, prompt-injection via retrieved doc, A2A impersonation) but no curated library exists.
Relations
- Defined by: Agentic AI Security Reference Architecture (2026) (the planes the CMM measures).
- Designed using: Cybersecurity Capability Maturity Models — Exemplars and Design Lessons (CMMI/BSIMM/SAMM/CMMC/NIST CSF 2.0 design lessons).
- Validated by: Validation: Agentic AI Security CMM vs Widely Adopted Standards (independent gap analysis vs widely adopted standards).
- Companions (added 2026-04-30 in response to validation):
- Agentic AI Security CMM — Standards Crosswalk Matrix — domain-by-standard anchor map
- Agentic AI Security CMM — Measurement Protocol (Assessor’s Handbook) — three-stage assessor’s handbook
- Anchored to incidents: ClawHavoc — Agentic Skill Marketplace Supply Chain Attack, SANDWORM_MODE npm worm — AI Toolchain Poisoning, Meta Sev 1 AI Agent Breach, MCP CVEs Q1 2026, Unit 42 In-the-Wild Prompt Injection Observations.