Red Teaming for AI: Synthesis

Question

What does a complete red-team practice for AI applications look like in 2026, across probe libraries, orchestration, regression suites, and continuous adversarial testing? Specifically: which tools cover which quadrants of the four-quadrant red-team grid from CMM D7 L4? What are the trust and provenance assumptions behind each? How does evaluation methodology (CLASP, ECBD, LLM-as-a-judge) tie back into the practice?

Current Position

This is the most mature of the wiki’s new scope axes — the substrate was already built before the scope expansion. The four-quadrant red-team coverage codified in CMM D7 L4 is the canonical decomposition:

Probe libraries. garak (NVIDIA) is the canonical OSS LLM vulnerability scanner — ~18+ probe categories spanning encoding, prompt-injection, GCG, DAN, malware generation, XSS, and leak-replay. Vendor-published numbers should be cross-checked against garak outputs.
Orchestration. PyRIT (Microsoft AI Red Team) provides multi-turn adversarial orchestration with adapters across OpenAI, Anthropic, Google, HuggingFace, and self-hosted endpoints — the de-facto OSS standard for orchestrated red-team campaigns.
Regression suites. Promptfoo is the regression-test surface for application-layer LLM behavior — most useful as the “CI gate” for prompts and tool definitions.
Continuous adversarial testing. Mindgard CART is the canonical SaaS for continuous red-team across deployed models; General Analysis is the agentic-AI-specific entrant.

Evaluation methodology is the harder problem. CLASP supplies a capability-centric evaluation rubric (Planning, Tool Use, Memory, Reasoning, Reflection, Perception); ECBD provides the design methodology for benchmark construction; LLM-as-a-judge is the semantic-matching approach that most evaluation toolchains converge on but that carries known failure modes (overconfidence, bias, prompt sensitivity).

The vendor surface for productized red-team-for-AI is consolidating around three incumbents: Lakera Guard for content-layer guardrails plus testing, HiddenLayer for AIDR with model scanning and adversarial robustness assessment, and Protect AI for AI-BOM, ModelScan, and the huntr bounty surface.

Supporting Evidence

AgentDojo (NeurIPS 2024) is the canonical independent benchmark for tool-using agents — 97 tasks, 629 security cases. Independent academic benchmarks remain rare; this one matters.
OWASP LLM Top 10 and OWASP Agentic AI Top 10 supply the vulnerability taxonomy that probe libraries map against.
OWASP AIVSS establishes the scoring framework for AI vulnerabilities — analogous to CVSS for traditional vulnerabilities.

Counter-Evidence

Coverage of model-extraction and inversion attacks in productized tooling

Model-layer attacks (extraction, inversion, membership inference) are well-documented as concepts but undermarked in the productized testing surface. Most commercial scanners focus on prompt-injection and jailbreak; the model-layer attack class is harder to test for and is consequently under-covered.

Independent reproducibility of vendor red-team claims

Vendor-published numbers dominate; few independent reproductions exist. The AgentDojo benchmark is one of the few neutral data points.

How This Has Evolved

Seeded 2026-05-13. This synthesis page consolidates material that was previously spread across concepts/, entities/products/, and the CMM domain definitions. As an existing-content synthesis (not an ingest-driven seed), this page can be promoted to developing status quickly once cross-links are added to the constituent pages.

Open Sub-Questions

Is redteam-for-ai a separate scope axis, or is it a sub-axis of sec-of-ai that should be collapsed? Current judgment: keep separate, because the tooling and methodology surface is large enough to warrant its own synthesis address.
How does red-team-for-AI methodology need to evolve for agentic AI (multi-turn, multi-tool, multi-agent) vs. classical LLM testing? Some of CLASP’s extensions hint at this but the field is unsettled.
See Gaps Index for related open questions.

Enterprise Security in the Agentic AI Era

Explorer

Red Teaming for AI: Synthesis

Red Teaming for AI: Synthesis

Question

Current Position

Supporting Evidence

Counter-Evidence

How This Has Evolved

Open Sub-Questions

Graph View

Table of Contents

Backlinks