Red Teaming Capability Framework

The Red Teaming Capability Framework is an internal proposal for structuring a red-teaming services capability for first-party agentic AI. It organizes the work into five tiers, each answering one operational question, from the governance constraints that authorize a test down to the evaluation of outside vendors. Its governing principle is that a capability is what a team can demonstrate, not the set of standards it can name.

A standard named is not a capability owned. Each tier below is defined by a question it must answer and by the evidence that answers it; the named standards are reference inputs to that work, not substitutes for it. A team that can cite every framework in Tier 2 but cannot produce an attack transcript, a coverage matrix, or a deterministic replay holds no Tier 2 capability.

The five tiers

Each tier states the question it answers, the capability it requires, and the standards that inform it. The standards are inputs; the capability is the deliverable.

Tier 1 — Foundational standards (governance and risk)

Why, and under what constraints, do we test? Governance, authorization, scope boundaries, safety norms, and business-impact framing.

Reference inputs: NIST AI RMF 1.0, NIST AI 600-1, and NIST SP 800-218A; IEC 42001 and ISO/IEC 23894; the EU AI Act, whose high-risk obligations apply from August 2, 2026 (subject to a possible Digital Omnibus delay); DORA for the financial sector; the NIST/CAISI RFI on securing AI agent systems.

Tier 2 — Threat taxonomies and attack libraries (what to test)

What do we test? Threat coverage across the model, agent, skill, and protocol layers.

Reference inputs by layer:

Model layer — OWASP LLM Top 10 (2025).
Agent reasoning and orchestration — OWASP Top 10 for Agentic Applications (ASI Top 10).
Skill and behavior execution — OWASP Agentic Skills Top 10 (AST01–AST10). This is an OWASP project in active development, not a ratified Top 10; treat its coverage as provisional.¹
Protocol and infrastructure — an MCP Top 10. No standalone OWASP list is ratified; the most concrete instantiation is Microsoft’s Azure-scoped “OWASP MCP Top 10 for Azure” mapping.² See MCP security.
Cross-cutting techniques — MITRE ATLAS.
Cross-layer model — CSA MAESTRO for tracing a threat across the layers above.

Tier 3 — Methodology and scoring (how to test)

How do we test, and how do we judge the result? Scenario design, non-determinism handling, evidence standards, metrics, severity models, and human validation.

Reference inputs: the OWASP GenAI Red Teaming Guide and the CSA Agentic AI Red Teaming Guide for method; OWASP AIVSS (v0.8, pre-1.0) for severity scoring; Google SAIF and CoSAI principles as the secure-by-design anchor. A numeric severity scale answers how severe only when paired with calibration of what each score requires — see Critical appraisal.

Tier 4 — Continuous operations (when and how often to test)

When, and how often, do we test? Regression testing, drift detection, observability, and safe re-testing.

Required capability, stated independent of any product: continuous and automated red teaming integrated into CI/CD and MLOps pipelines; behavioral baseline monitoring with runtime anomaly detection; scan-on-change triggers for model updates, tool additions, and prompt modifications; and ingestion of current campaign intelligence, such as Snyk’s ToxicSkills audit of the agent-skill registry³ and the ClawHavoc skill-marketplace compromise. Automation augments human adversaries; it does not replace them, and emergent or cross-agent behavior still requires human review.

Tier 5 — Vendor evaluation and compliance reporting

How do we evaluate others, and how do we demonstrate compliance? RFIs, proof-of-concept gates, evidence standards, and compliance mapping.

Reference inputs: the OWASP Vendor Evaluation Criteria for AI Red Teaming Providers and Tooling v1.0; compliance mapping to the EU AI Act, DORA, HIPAA, and NIST AI RMF; and AI-BOM / SBOM requirements per CycloneDX. The buyer-side counterpart to this tier is treated in The AI Security Larsen Effect.

What makes a tier a capability

The tiers describe coverage; the following rules are what convert coverage into a defensible capability, and they are the part the standards lists alone do not supply.

Stable capability IDs. Every capability carries a fixed identifier (for example, T3.MET.04). Assets — questionnaires, response tables, decks, training material — map to capabilities; an asset that references no capability is out of scope.
Evidence is mandatory for any capability claimed as primary. Acceptable evidence is a threat model, a coverage matrix, an attack transcript or trace, a deterministic replay artifact, a scoring output, a proof-of-concept result, or an authorization artifact. Marketing language, context-free screenshots, “we can build it” statements, and one-off demos without logs do not count. A capability that cannot be evidenced must not be claimed as primary.
Named anti-patterns are prohibited. Claiming full coverage without evidence, conflating jailbreak demos with systemic red teaming, and selling capabilities that cannot be staffed or evidenced are explicit failure modes the framework exists to prevent.

Critical appraisal

The framework’s weaknesses are structural, not in its choice of standards. The standards selection is current and well-targeted; the risks lie in how the framework is read and in what it asserts without showing.

Inventory read as capability. Presented as five lists of standards, the framework invites the exact failure its own governing rules prohibit: treating an enumeration of frameworks as proof of capability. The evidence rules above are the corrective, and they belong at the front of the framework rather than as an appendix to the lists.

Coverage list vs. evidence rule

A tier rendered as a bullet list of standards reads as “we cover this.” The framework’s own rule is that coverage is unproven until a threat model, transcript, replay, or score exists. The lists and the evidence requirement must be read together, or the framework contradicts itself.

Maturity is not flat. The tiers place ratified, load-bearing standards (NIST AI RMF 1.0, ISO/IEC 42001, MITRE ATLAS, OWASP LLM Top 10) alongside pre-1.0 and in-flight artifacts (the AST proposal, AIVSS v0.8, the MCP list as a vendor-scoped mapping). Citing all of them at equal authority overstates what can currently anchor an evidence-defensible claim. The skill-layer and protocol-layer taxonomies will move as they ratify, and any coverage claim built on them moves with them.

Automation does not replace adversaries. Tier 4’s continuous-platform language risks the documented evaluation pitfall of treating automation as a substitute for experienced red teamers. The framework’s own operating rule is augmentation: automation supplies scale, repeatability, and regression coverage, while human testers supply novelty and the discovery of emergent and cross-agent behavior. Tier 4 should name the human-to-automation division explicitly rather than leave it implied.

A scale without calibration is false precision. A 0–5 rubric or an AIVSS score produces consistent-looking numbers only when each level is calibrated — when the evidence that distinguishes an exemplary finding from an adequate one is written down per item. Without that calibration, inter-rater reliability is low and a precise-looking score conceals subjective judgment. This is the gap an evaluator most easily misses, because the number appears objective.

Grounding is argued, not measured

The framework asserts the five-tier partition as what a 2026 capability “should be founded on” without comparative evidence that this partition outperforms alternatives, and without adoption data. It is a defensible reasoned proposal; its authority is argumentative, not empirical. A later revision should cite engagement outcomes or external adoption to move it from proposal to evidenced standard.

Mapping to the Agentic AI Security CMM and Reference Architecture

The five tiers align to the Agentic AI Security CMM domains and the reference architecture as follows. See the CMM crosswalk for control-level detail.

Tier 1 maps to CMM D1 (Governance).
Tier 2 maps to D4 (Runtime guardrails) and D6 (Data and RAG), and to the threat surface of the reference architecture.
Tier 3 maps to D7 (Observability) and the CMM measurement protocol.
Tier 4 maps to D9 (Operations).
Tier 5 maps to D8 (Supply chain) and vendor procurement.

Open questions and caveats

Empirical basis for the partition

No evidence yet establishes that the five-tier split is better than a three-phase or domain-based alternative, nor that all five tiers are load-bearing for every engagement. The partition is reasoned, not measured.

Immature lower-layer taxonomies

The skill-layer (AST) and protocol-layer (MCP) taxonomies are pre-ratification. Coverage claims that depend on them are provisional and should be re-checked against the published lists when they land.

Notes

OWASP — Agentic Skills Top 10 Project Proposal, 2026. OWASP project in active development (v1.0, 2026 edition), covering agent-skill supply-chain risks (AST01–AST10). Not a ratified OWASP Top 10 at time of writing. ↩
Microsoft — OWASP MCP Top 10 for Azure, MCP03 Tool Poisoning, 2026. The most concrete instantiation of an MCP Top 10 is Microsoft’s Azure-scoped mapping, not a standalone ratified OWASP list. ↩
Snyk — ToxicSkills: Malicious AI Agent Skills in ClawHub, 2026. First broad security audit of the agent-skill ecosystem (ClawHub and skills.sh); coined “ToxicSkills” for skills that read as benign but act maliciously when executed by a capable agent. ↩

Enterprise Security in the Agentic AI Era

Explorer

Red Teaming Capability Framework

Red Teaming Capability Framework

The five tiers

Tier 1 — Foundational standards (governance and risk)

Tier 2 — Threat taxonomies and attack libraries (what to test)

Tier 3 — Methodology and scoring (how to test)

Tier 4 — Continuous operations (when and how often to test)

Tier 5 — Vendor evaluation and compliance reporting

What makes a tier a capability

Critical appraisal

Mapping to the Agentic AI Security CMM and Reference Architecture

Open questions and caveats

See also

Notes

Graph View

Table of Contents

Backlinks

Enterprise Security in the Agentic AI Era

Explorer

Red Teaming Capability Framework

Red Teaming Capability Framework

The five tiers

Tier 1 — Foundational standards (governance and risk)

Tier 2 — Threat taxonomies and attack libraries (what to test)

Tier 3 — Methodology and scoring (how to test)

Tier 4 — Continuous operations (when and how often to test)

Tier 5 — Vendor evaluation and compliance reporting

What makes a tier a capability

Critical appraisal

Mapping to the Agentic AI Security CMM and Reference Architecture

Open questions and caveats

See also

Notes

Footnotes

Graph View

Table of Contents

Backlinks