Peer-Review Readiness — Gaps in the RA + CMM

Checkpoint review (2026-05-02) by Claude playing devil’s advocate, before the RA + CMM are submitted to actual industry peers. Tabled for action — captured here so the gaps don’t fall off the radar.

Top 7 critical gaps (ranked by reviewer fire)

1. No empirical validation; no production case studies

Every claim is synthesized from secondary sources (Gartner, vendor blogs, Q1 2026 incidents) plus reasoning. A serious reviewer’s first question: “Name three organizations operating at L3 against this CMM. Show me their evidence.” Without case studies, it reads as a paper architecture. The “validation page” is a Claude-vs-standards self-check, not external peer review.

Severity: highest. Everything else flows from this.

2. No cost / effort / ROI model

What does it cost to move from L2 to L3? What’s the team size? Timeline? Without these numbers, executives can’t act, and the CMM is a paper exercise. CMMI, BSIMM, SAMM all have at least rough effort estimates. Ours has none.

3. L4 → L5 calibration suspect; cumulative-floor rule un-stress-tested

L4 = “managed, quantitative, continuous”; L5 = “platform-level enforcement everywhere + Proof-of-Guardrail attestation + multi-agent UEBA + standards contributions.” The L4→L5 jump is much bigger than L3→L4. Separately, “floor across all 9 domains” sounds rigorous but is operationally onerous — most real orgs would self-assess at L1 because of one weak domain. CMMI itself debates this trade-off; ours just declares the rule.

Addressed 2026-05-02 — see CMM Calibration Stress Test

Two-part analysis. Part 1: L5 currently mixes stable maturity (platform-level enforcement everywhere, AIUC-1 cert) with research-stage capabilities (Proof-of-Guardrail TEE attestation, multi-agent cascade detection) with org-character (standards contributions). Recommended fix: split L5 into L5 Optimizing (stable, achievable) and L5+ Leading Edge (research-stage + standards contribution). The L4→L5 transition becomes capability-scale-up + duration + bus-factor (matching CMMI’s “managed → optimizing”); L5→L5+ becomes the genuinely-research-stage step. Part 2: Floor rule stress-tested against 5 realistic archetypes — Stripe-style architectural-containment (floor L2 driven by D7), Microsoft Agent 365-driven enterprise (floor L2 driven by D9), startup with strong AI security (floor L1 driven by D9 bus-factor-1), regulated financial services (floor L3, fair), multi-cloud program (floor L3, fair). Floor rule is fair when balanced, unfair-but-honest when programs have one structurally-weak domain. 5 calibration changes recommended: (1) L5/L5+ split; (2) per-domain matrix as primary report view, floor as headline; (3) document the 5 archetypes; (4) D7 contradiction callout converges on labeled-trade-off framing; (5) L4→L5 prerequisite gate (≥2 quarters stable L4, AIUC-1 readiness scheduled, named standards-contributor, bus-factor ≥ 2). Adoption pending peer review of framework as a whole.

4. No anti-patterns / failure-modes catalog

Every mature framework documents how it goes wrong: BSIMM has activities-not-undertaken; CMMC has appeals; SAMM has scoring caveats. Common failure modes (PDP becomes a bottleneck; Sentinel signals exceed Operative bandwidth; cumulative-floor demoralizes teams; metagovernance regresses) aren’t named.

Addressed 2026-05-02 — see Anti-Patterns and Failure Modes

25 catalogued anti-patterns across 9 categories: Architecture (PDP bottleneck, Sentinel flood, egress chokepoint, deep-agent bypass), CMM scoring (floor demoralizes, cherry-picking, evidence theatre, stub-as-evidence), Operations (metagovernance regresses, HITL fatigue, baseline staleness, Goodhart eval, ceremonial red-team), Threat-model (trifecta-split that isn’t, cascade-no-thresholds, single-tool red-team L4, behavioral monitoring no SOC integration), Standards/compliance (AIUC-1 frozen, crosswalk-as-decoration, standards shopping), Identity/credential (rotation gap, identity-credential coupling unaddressed), Multi-agent (all-to-all comms, restart-everything recovery), Procurement (platform-buy = coverage-complete, vendor-promise-as-evidence, decision-rights skipped), Talent/org (data scientists with no security training, security team with no AI training, bus factor 1). Each entry: pattern + why it happens + failure mode + recovery + wiki anchor. Maps to BSIMM activities-not-undertaken / CMMC appeals / SAMM scoring caveats / NIST CSF Implementation Tier gaps. Standing process: every L3+ self-assessment walks this list before claiming evidence; orgs that name 3+ anti-patterns affecting them are operating in good faith.

5. Adversarial threat model too narrow

Mostly external attackers via prompt injection, supply chain, MCP compromise. Missing: insider threat with model access; long-running adaptive adversarial campaigns; collusion between agents and humans; model-version-degradation attacks; jurisdictional adversaries (state actors with regulatory leverage). Lethal Trifecta is a structural test for one specific exfil pattern, not a comprehensive threat model.

All five missing classes documented with authoritative sources (NIST AI 100-2e2025, RAND RR-A2849-1, CrowdStrike 2026, Anthropic GTG-1002, UK AISI Frontier Trends, Apollo Research steganographic collusion, Anthropic Sleeper Agents, CSET Georgetown export-controls research) and mapped to RA planes + CMM domains. New incident anchor page GTG-1002 added. Cross-class synthesis identifies the AI-BOM + always-on customer eval harness as the single highest-leverage control (absorbs Classes 1, 2, 4). Class 5 (jurisdictional) flagged as governance-led, not artifact-controlled.

6. Multi-agent specifics under-architected

A2A protocol is a stub. Cascade detection (ASI08) is named but not designed. Multi-agent UEBA is mentioned but not specified. Inter-agent IR is absent. The “multi-agent mesh” deployment shape lacks the depth the single-agent shapes have.

Addressed 2026-05-02 — see A2A Protocol + Multi-Agent Runtime Security

  • A2A protocol stub filled. Spec corrected to v1.0.0 (Mar 2026), Linux Foundation-governed since June 2025. Documented transport (§7), Agent Card signing framework (§8.4, algorithm-agnostic), opacity principle, what the spec does NOT specify (no replay protection, no mandatory crypto algorithm, no multi-hop trust chain), CSA MAESTRO threat model, Issue #1575 (Agent Passport System), Red Hat hardening guide, IETF AIP draft. Vendor-side enforcement (Oktsec v0.15.2 with 268 detection rules — wiki’s prior “175” was stale) clearly distinguished from spec-mandated controls.
  • Cascade detection (ASI08) designed. Six observable symptoms (rapid fan-out, cross-domain spread, oscillating retries, queue storms, repeated identical intents, cross-agent feedback loops) cataloged from OWASP/Adversa. Three academic detection primitives (SentinelAgent graph-walk, TraceAegis provenance, Bi-Level GAD theme-based). Vendor primitives (Oktsec, Aguara, LangSmith) honestly mapped. Wiki is explicit that 2026 is the academic-prototype era — no integrated cascade-detection product ships with documented thresholds.
  • Multi-agent behavioral baselines specified. Three shapes additional to single-agent: aggregate-level invariants (capacity, rate-pair, compartmentalization), pairwise/triadic traffic baselines (graph-edge anomaly), cross-agent drift correlation (lockstep drift = shared upstream contamination signal).
  • Inter-agent IR documented. First-principles stop-mesh-vs-isolate decision tree (default fail-mode is stop-mesh). Cross-agent forensics primitives (causal correlation across agent boundaries, tamper-evident audit chain, message-graph reconstruction). Three recovery shapes (selective rollback / rolling restart / mesh-wide quarantine).
  • RA Multi-agent mesh row deepened to match single-agent shape depth — Identity (Ed25519 + SPIFFE), Control (default-deny ACL + opacity + stop-mesh doctrine), Runtime (per-agent sandbox + AgentGateway broker), Egress (A2A v1.0 + signed Cards + 268 rules + replay protection), Data (cross-agent OTel propagation), Observability (pairwise/triadic baselines + graph-walk anomaly + cascade detection).
  • Stale references corrected vault-wide. A2A “v0.3” → v1.0.0 (5 pages); Oktsec “175 rules” → 268 (3 pages); Apollo Research multi-agent collusion attribution clarified (single-model scheming work; multi-agent detection counterpart is by other academic groups — NARCBench, Colosseum, Bagdasarian et al.).

Honest read: 2026 is the academic-prototype era for cascade detection at scale. Wiki maturity ladder reflects this — L1/L2 shippable today, L3 prototype-grade, L4 research, L5 aspirational.

7. Novelty claims and competing-view callouts absent

What’s actually new in the RA + CMM vs. existing literature? What’s specifically the wiki’s contribution that wasn’t already in OWASP / NIST / Gartner / CSA? And where would a serious skeptic push back — what are the strongest counter-arguments to “platform-layer over prompt-layer,” to “guardian agents will eliminate 50% of security systems,” to “Lethal Trifecta is unconditionally vulnerable”? Mature frameworks acknowledge their critics; ours doesn’t yet.

Addressed 2026-05-02 — see Wiki Novelty and Counter-Arguments

Two-part appendix: (1) genuinely novel contributions (10 items: 6-plane RA with XACML colors across 7 deployment shapes; 5×9 cumulative-floor CMM with ID-tagged evidence; Cognitive File Integrity; Identity-Credential Coupling operationalized as D2 L4 evidence; D9 Operations & Human Factors; four-quadrant red-team coverage at D7 L4; multi-agent runtime security depth; five-class threat expansion; AI-BOM + always-on eval as multi-class absorber; stop-mesh-vs-isolate doctrine) plus 5 sharpenings of existing concepts. (2) Per-thesis competing-view callouts for the 6 load-bearing positions (platform-vs-prompt; Gartner 50% elimination; Lethal Trifecta unconditional; floor rule; eval-harness as multi-class absorber; UEBA-for-agents). Each thesis answers “where the skeptic’s right” and what the wiki’s honest response is. RA design principles 1+5 now carry inline competing-view callouts; lethal-trifecta page softens “unconditional”; guardian-agent page flags Gartner 2029 prediction as low-credibility forecast. 10 unresolved contests explicitly logged.

Honorable-mention gaps (real, but less urgent)

  • Statistics drawn from a narrow source set — Gartner, Insight Partners, Knostic, OWASP — same handful. No academic literature or industry surveys beyond these. Addressed 2026-05-02 — see Source Triangulation Audit. 8 load-bearing claims triangulated against academic + government-survey sources; 5 corroborated, 2 partially corroborated, 3 remain genuinely single-sourced (MCP CVE percentages, lab-self-reported scheming rates, Bullen-talk-specific ASR figures) and now labeled as such. Five new sources added to the citation rotation: Stanford HAI AI Index, Verizon DBIR 2025, METR, WEF Global Cybersecurity Outlook, AgentDojo (NeurIPS); 5 sources flagged as vendor-conflicted (CyberArk, SailPoint, Knostic, Salesforce, Promptfoo).
  • No formal-methods / proof story — Reference Monitor named but not used to establish properties.
  • No skills / talent / team-composition story — who runs this? What roles? What career paths?
  • No tooling-lock-in or vendor-risk analysis — RA name-drops vendors without analyzing switching costs.
  • No EU AI Act Annex IV per-item evidence walkthrough — crosswalk maps it but doesn’t generate the artifact.
  • HITL named as primitive but no taxonomy of when/how/why human-in-the-loop is appropriate.
  • L3+ ID-tagging requirement is a high bar — has anyone actually done it?
  • No model-version-update re-evaluation playbook — when foundation model updates, what re-evaluation is required?
  • IR specificity thin — every page has §Defensive Lessons; none has a full incident-response runbook.
  • [“UEBA for Agents” attribution is single-source] — see UEBA-for-Agents single-source attribution below.

Stub-filling pass 2026-05-02 (peer-review readiness #3)

High-leverage stubs filled with source-anchored pages:

  • AIUC-1 — six pillars, accreditation status (Schellman first, LRQA pilot), quarterly cadence, certified-org list, two-actor audit caveat. Closes the largest peer-review gap — D1 L4/L5 evidence claims now have a defensible reference.
  • PyRIT + Garak + Promptfoo + Mindgard CART — the four-quadrant red-team coverage the CMM D7 L4 demands now has individual pages with version pins, scope/non-scope, and explicit seam analysis. CMM D7 L4 cell wikilinks updated.
  • Promptfoo’s “Your model upgrade just broke your agent’s safety” post is the empirical anchor for both Threat Class 4 and the still-open model-version-update re-evaluation playbook honorable-mention above. Headline numbers (94% → 71% prompt-injection resistance on GPT-4o → GPT-4.1) now in the wiki.

Remaining stubs (lower leverage): SLSA, AgentDojo, FIDES, BSIMM, CMMI, A2A protocol, multiple incident pages.

UEBA-for-Agents single-source attribution

Identified during this checkpoint. The phrase “UEBA for Agents” is amplified across ~10 wiki pages. Source trace:

  • Origin: wiki/papers/securing-the-autonomous-future.md (Insight Partners, Oct 2025), which states: “Termed ‘UEBA for Agents’ by enterprise CISOs.”
  • Attribution chain: One vendor paper (Insight Partners — VC firm with portfolio investments in named security vendors), citing anonymous “enterprise CISOs.”
  • Conceptual issue: UEBA originated for stable user/host identities with persistent behavioral baselines. Most UEBA products merged into SIEM/XDR by 2020. AI agents are often ephemeral, non-deterministic, and lack stable baselines. The metaphor may not transfer cleanly.

Recommendation: soften wiki references to “agent behavioral monitoring” or “behavioral baselines for agents”; reserve “UEBA-for-Agents” as an Insight-Partners coining cited at first occurrence with a callout. Don’t drop the underlying concept — behavioral monitoring of agent activity is real and useful — drop the colloquial branding.

Sources for “what reviewers would actually say”

Persona simulation (one-shot exercise — deferred for now):

PersonaLikely sharpest critique
Skeptical CISO”I have agents with all three Lethal Trifecta properties and they aren’t compromised — what’s the empirical base rate?” / “Show me an audit finding from real use.” / “What’s the median cost of L2→L3?”
Formal-methods researcher”PDP/PEP framing is good but you don’t formalize what the PDP is guaranteeing.” / “Reference Monitor properties (always-invoked, tamper-proof, verifiable) — which controls actually meet these?”
Startup CTO (5 engineers, shipping next week)“Your L1–L5 ladder assumes Fortune 500. What’s the minimum viable RA for me?”
Compliance officer”Does any of this satisfy any actual regulation in a defensible way?” / “EU AI Act Annex IV walkthrough?” / “What’s the audit trail I hand to a regulator?”
Safety / alignment researcher”Where’s the model-internal monitoring? You have behavioral observability but not interpretability.” / “Goal manipulation is named but not defended against in any deep way.” / “Multi-agent emergent behavior gets one bullet.”

Status

Tabled. Captured here so the gaps don’t fall off the radar. Triage and execution to be sequenced separately.

The full cynical-reviewer exercise (4-persona structured critique pass) is not in flight; this page is the human-pass note-taking version.

See Also