Agentic AI Security CMM — Effective-Score Dependency Rules

This page defines the dependency-resolved effective-score mechanism that replaces the single cumulative floor as the CMM’s headline aggregation rule. The page is intentionally scaffolded: a small, conservative active rule set (v1 = 3 rules) plus a candidate-rules registry that gets populated as the wiki grows new attack-path evidence and practitioner architectures.

Why this exists

The prior single-floor rule (imported from CMMC 2.0) misreported 3 of 5 realistic archetypes in the 2026-05-02 stress test — Stripe-style architectural-containment, Microsoft Agent 365-driven, and resource-constrained startup all received headline ratings that materially under-reported the program. The L5/L5+ split adopted on 2026-05-04 also broke the floor rule’s premise that domains are interchangeable units. Dependency-resolved scoring replaces the blunt min() with substantive cross-domain caps anchored to documented attack paths — and explicitly tracks which caps we have evidence for vs. which are still candidates.

The effective-score formula

Each domain D has two scores:

Raw score — the assessor’s per-domain rating against the L1–L5 (and optionally L5+) criteria in the CMM
Effective score — min(raw_score(D), min over deps in dependencies(D) of raw_score(dep))

In pseudocode:

def effective_score(domain, raw_scores, active_rules):
    deps = [rule.upstream for rule in active_rules if rule.downstream == domain]
    if not deps:
        return raw_scores[domain]
    cap = min(raw_scores[d] for d in deps)
    return min(raw_scores[domain], cap)

The headline is no longer a single number. It is a three-number summary:

Typical = median of effective scores across all 9 domains
Weakest = min of effective scores (with the domain that set it labeled, plus any cap that fired)
Strongest = max of raw scores (labeled with the domain)

Plus the full per-domain matrix (raw + effective + which caps fired). Plus an optional strategic rationale field for any domain whose raw score is intentionally below its peers (Stripe-style architectural-containment trade-offs).

Active rules — v1 (2026-05-04, 3 rules)

These are the rules currently in force. Conservative on purpose: every active rule has a clear cross-domain attack path documented in the wiki and a directional rationale (why upstream caps downstream, not the other way around).

ID	Rule	Direction	Evidence anchor	Adopted	Notes
DR-001	D2 caps D5	`effective(D5) ≤ raw(D2)`	Lethal trifecta: without per-agent identity (D2), per-agent egress policy (D5) cannot be enforced — any agent can impersonate any other agent at the network boundary. The egress gateway has nothing to bind policy to.	2026-05-04	Stripe and Salesforce both treat D2 as the precondition for meaningful D5 enforcement
DR-002	D2 caps D7	`effective(D7) ≤ raw(D2)`	Without per-agent identity (D2), behavioral anomalies (D7) can only be attributed at fleet level. The Salesforce Rittinghouse 1.8M-prompts-to-30-alerts pipeline depends on per-agent identity to make alerts actionable.	2026-05-04	Distinct from DR-001: identity gates attribution in D7, not just enforcement in D5
DR-003	D3 caps D4	`effective(D4) ≤ raw(D3)`	Without policy decisions (D3 PDP), runtime guardrails (D4 PEP) have nothing to enforce — the lifecycle hook fires but no policy decision exists to evaluate against. The Sondera Cedar harness makes this explicit: D4 is structurally downstream of D3 in agentic enforcement	2026-05-04	The reverse cap (D4 → D3) is also partially true but weaker; we adopt only the stronger direction

Promotion threshold met for DR-001/002/003: each has ≥2 wiki-documented practitioner architectures (Stripe + Salesforce + AgentCordon for DR-001/002; Sondera + AgentCordon for DR-003) and a clear lethal-trifecta-class attack path.

Candidate rules registry

Proposed rules whose evidence is suggestive but not yet sufficient for active promotion. Add new candidates here freely. Promotion to active happens at quarterly CMM revisions (or sooner with explicit wiki ingest evidence).

ID	Proposed rule	Direction	Evidence shape we’d want	Status	Notes
DR-C001	D8 caps D6	`effective(D6) ≤ raw(D8)`	≥2 documented incidents where supply-chain compromise (D8 weak) corrupted data integrity (D6) — e.g. ClawHavoc-class skill swap poisoning a downstream RAG corpus	candidate	Likely promotion in 2026-Q4 once 2+ cross-domain incidents are catalogued; currently 1 (ClawHavoc)
DR-C002	D5 caps D7	`effective(D7) ≤ raw(D5)`	Production cases where egress is the only signal source for detection — when D5 is L1, D7 has no telemetry to monitor	candidate	Stripe archetype is the counter-example: their architectural containment makes D5 a primary signal source even with lower D7. Hold pending more data on whether this pattern is general or Stripe-specific
DR-C003	D4 caps D5	`effective(D5) ≤ raw(D4)`	Runtime guardrail bypass enabling egress bypass; or runtime hook gap allowing direct OS-level egress	candidate — weak directionality	Runtime and egress are co-load-bearing in most architectures; directionality is unclear. Park until a clear asymmetric attack path is documented
DR-C004	D6 caps D4	`effective(D4) ≤ raw(D6)`	Poisoned RAG (PoisonedRAG, ConfusedPilot — see memory-poisoning concept) corrupting runtime decisions	candidate — needs production evidence	The dependency exists conceptually; production-evidence is still research-stage. Re-check when AgentDojo / equivalent benchmarks publish cross-domain bypass results
DR-C005	D9 caps D2	`effective(D2) ≤ raw(D9)`	Operational decommission failures leaving identity-bound credentials live after agent retirement	candidate — operational-vs-technical boundary	Likely belongs as a soft cap (rate-of-decay rather than hard min), not a hard cap. Defer until soft-cap semantics are designed
DR-C006	D1 caps everything	`effective(D*) ≤ raw(D1)`	Programs with L1 governance that nonetheless ship strong technical controls — does the governance gap actually undermine the technical controls?	candidate — likely rejected	Existing wiki evidence suggests technical controls operate independently of governance maturity in the moment; governance shows up over time, not at enforcement time. Park as a likely non-rule unless evidence flips

Promotion criteria

A candidate rule is promoted to active when at least one of the following is met, AND the rule is reviewed at the next quarterly CMM revision:

≥2 documented incidents in the wiki where the dependency manifests as a real attack path (incident pages with cross-domain causation noted)
≥1 peer-reviewed paper or vendor-published threat-model establishing the dependency as substantive (not theoretical)
≥2 practitioner architectures documented in the wiki (talks, deployments, vendor whitepapers) where the dependency is treated as load-bearing
Synthetic-incident library coverage — if the measurement protocol’s synthetic-incident library (currently a known gap) covers the cross-domain attack path with a documented test case

Any of (1)–(4) is sufficient. The rule’s evidence anchor in the active table MUST cite the qualifying source(s).

Deprecation criteria

An active rule is deprecated when:

Counter-evidence accumulates — ≥2 documented practitioner architectures where the dependency is not load-bearing (e.g. Stripe-style architectural patterns where the upstream domain is structurally bypassed without compromising the downstream domain)
Quarterly revision finds the rule no longer reflects practice (consensus call, documented in the revision log)
A more precise rule replaces it (e.g. soft caps, conditional caps, archetype-specific caps)

Deprecated rules stay in the registry with status: deprecated and a deprecation rationale. They do not affect new assessments but historical reports can be reproduced.

Revision protocol

When	What
Any time	New candidates can be added to the candidate-rules table by anyone editing this page. Add `id`, proposed rule, direction, evidence shape we’d want, status: candidate, notes.
Each wiki ingest of an incident	Check whether the new incident provides cross-domain evidence relevant to an existing candidate. If so, add the citation to that candidate’s notes column.
Quarterly (Q1 / Q2 / Q3 / Q4)	Review all candidates against promotion criteria. Promote, hold, or reject. Increment rule-set version on any promotion or deprecation (v1 → v2 → …). Log the revision in `wiki/log.md` and append to the revision history below.
CMM major revision	Re-validate active rules against the latest evidence; deprecate rules that no longer reflect practice.

Reporting impact

The measurement protocol’s gap report changes shape. Old format:

Headline: L1 (floor — D9 set the floor)
Matrix: D1=L3 D2=L4 D3=L4 D4=L3 D5=L4 D6=L3 D7=L2 D8=L3 D9=L1

New format (Stripe-style architectural-containment archetype example, under v1 rules):

Headline:
  Typical (median effective): L4
  Weakest: D7 effective L2 (raw L2; no upstream cap fired)
  Strongest: D5 raw L4-L5 (effective L4 — capped by DR-001 from D2)
  Strategic rationale: D7 light by deliberate trade-off — D3+D5 architectural containment per Stripe Bullen talk

Per-domain matrix (raw / effective / cap source):
  D1: L3 / L3 / —
  D2: L4 / L4 / —
  D3: L4 / L4 / —
  D4: L3 / L3 / capped by DR-003 to raw(D3)=L4 (no effect — raw already L3)
  D5: L4-L5 / L4 / capped by DR-001 to raw(D2)=L4
  D6: L3 / L3 / —
  D7: L2 / L2 / capped by DR-002 to raw(D2)=L4 (no effect — raw already L2)
  D8: L3 / L3 / —
  D9: L3 / L3 / —

Active rule set: v1 (DR-001, DR-002, DR-003)

The headline is now informative — it shows the program’s shape rather than collapsing it to a single misleading number.

Worked examples — re-running the stress-test archetypes

Comparison of the 5 archetypes from the stress test under the old floor vs. v1 effective-score:

Archetype	Old floor (single number)	v1 effective-score headline (typical / weakest / strongest)	Improvement vs old?
Stripe-style architectural-containment	L2	L4 typical / L2 D7 (intentional trade-off) / L4 D5 (capped by DR-001 from D2)	Yes — typical L4 reflects the program; D7 honestly noted as weakest with rationale
Microsoft Agent 365-driven	L2	L3 typical / L2 D9 (no upstream cap) / L5 D2	Yes — D9 ops lag doesn’t drag D2 down (no D9→D2 rule in v1; DR-C005 is candidate not active)
Startup with bus-factor 1	L1	L3 typical / L1 D9 (bus factor) / L3 D2/D3/D4/D5	Yes — technical maturity isn’t dragged down
Regulated FS (balanced L3-L4)	L3	L3-L4 typical / L3 weakest / L4 strongest	Equivalent — fair under both rules
Multi-cloud (balanced L3-L4)	L3	L3-L4 typical / L3 weakest / L4 strongest	Equivalent — fair under both rules

Net effect of v1 rules: the 3 archetypes the floor misreported are now reported fairly; the 2 archetypes the floor reported fairly are still reported fairly. Cherry-picking is now prevented by mandatory matrix disclosure + strategic-rationale field rather than by mathematical aggregation.

What this does NOT do

Does not eliminate the cross-domain attack-path concern. DR-001/002/003 capture the strongest known cases. Future incidents and architectures will surface more (the candidates are the parking lot).
Does not allow cherry-picking. Reports MUST publish the full matrix; reports that cite a single domain’s score without the matrix are non-compliant with the measurement protocol (anti-pattern B2 reframed accordingly).
Does not replace the L4→L5 prerequisite gate (≥2 quarters stable L4, AIUC-1 readiness scheduled, bus-factor ≥2, continuity test). Effective-score is aggregation; the prerequisite gate is eligibility for L5 claims. Both apply.
Does not address weighted scoring. All 9 domains are still treated as equally important when computing typical/weakest/strongest. Domain weighting (e.g. for high-risk-tier applications) is a separate question parked under the agent-archetype tailoring open gap on the CMM page.

Open questions / known unknowns

Things this scaffolding doesn't yet handle

Soft caps vs hard caps. DR-C005 (D9 caps D2) is a strong candidate for soft capping (operational lag degrades technical controls over time, not in the moment). The current schema only supports hard caps. Soft-cap semantics are a v2+ design problem.

Conditional caps. Some caps may only apply for specific application archetypes (e.g. D4 caps D5 may apply for consumer-facing chatbots but not for internal agent platforms). The current schema doesn’t support conditions.

Multi-hop transitive caps. If D2 caps D5 and D5 caps D7 (DR-C002 candidate), should D2 transitively cap D7 via D5? Currently each rule is independent. Worth re-examining if DR-C002 is promoted.

Rule interactions. Two rules pointing at the same downstream domain currently take min() of their upstream caps. This is the conservative choice but may be wrong in cases where the caps are partially redundant (capture the same attack path). No counter-evidence yet but flag.

Negative rules / floor-relaxation. Should there be rules that raise an effective score (e.g. D3+D5 both at L4 raises the effective ceiling on D7 for the Stripe-archetype case, since architectural containment substitutes for behavioral observability)? Currently rules can only cap, not relax. v2+ design problem.

Scoring stability across rule-set versions. When v1 → v2 promotes a new active rule, prior assessments’ headlines may shift. The protocol should specify which rule set a published rating was computed under (annotate as “v1 effective-score” or similar).

Revision history

Version	Date	Changes	Active rule count
v1	2026-05-04	Initial scaffolding. 3 active rules (DR-001 D2→D5, DR-002 D2→D7, DR-003 D3→D4) anchored to lethal-trifecta + Sondera/AgentCordon evidence. 6 candidate rules parked.	3

Relations

Replaces: the single cumulative-floor rule in CMM 2026 (imported from CMMC 2.0)
Operationalized by: Measurement Protocol §Floor rule (rewritten 2026-05-04 to point here)
Resolves: stress test §Change 2 (matrix-as-primary view) and §Change 4 (D7 contradiction recommendation) — both adopted via the new effective-score headline format
Reframes: Anti-Pattern B1 (cumulative-floor demoralizes — mostly resolved) and Anti-Pattern B2 (cherry-picking — reframed as disclosure-discipline failure)
Updates: Counter-Arguments Thesis 4 — wiki’s stated position changes from “keep floor” to “replace floor with dependency-resolved effective scores”
Anchored to: Lethal Trifecta (DR-001, DR-002 directional rationale); Sondera Cedar harness (DR-003 directional rationale); Salesforce Rittinghouse (DR-002 production evidence); Stripe Bullen (Stripe archetype worked example); AgentCordon (DR-001/003 OSS reference architecture)

Enterprise Security in the Agentic AI Era

Explorer

Agentic AI Security CMM — Effective-Score Dependency Rules

Agentic AI Security CMM — Effective-Score Dependency Rules

The effective-score formula

Active rules — v1 (2026-05-04, 3 rules)

Candidate rules registry

Promotion criteria

Deprecation criteria

Revision protocol

Reporting impact

Worked examples — re-running the stress-test archetypes

What this does NOT do

Open questions / known unknowns

Revision history

Relations

Graph View

Table of Contents

Backlinks