Multi-Agent Runtime Security — Cascade Detection, Behavioral Baselines, Inter-Agent IR
The depth-companion to the wiki’s single-agent observability page, focused on what’s specific to multi-agent meshes: cascade-failure detection (OWASP ASI08), pairwise/aggregate behavioral baselines, and inter-agent incident response. Closes peer-review readiness §6 (multi-agent specifics under-architected) and the RA’s “Multi-agent failure containment” gap.
2026 is the academic-prototype era for these capabilities
Cascade detection at scale is research, not product. Graph-based monitors (SentinelAgent, TraceAegis, bi-level GAD) ship as papers, not platforms. Vendor-side coverage (Oktsec rate limits + ACLs, Aguara rule mappings, LangSmith observability) provides primitives but no integrated cascade-detection product ships with documented thresholds. The wiki documents the maturity ladder honestly: today’s L1–L2 is achievable; L3+ is research-grade.
Threat surface — what’s distinct about multi-agent
Single-agent threat models cover prompt injection, tool misuse, and output safety. Multi-agent meshes add three structural threat shapes that have no single-agent analogue:
| Threat shape | OWASP anchor | What’s distinct |
|---|---|---|
| Cascading failures | ASI08 | One agent’s misbehavior or compromise propagates through the mesh — fan-out faster than human response |
| Insecure inter-agent communication | ASI07 | Authentication, integrity, replay, content scanning between peer agents (the A2A surface) |
| Multi-agent collusion | (no single ASI; threat-classes Class 3) | Two or more agents coordinate to bypass an oversight that would catch either alone |
The wiki’s Threat Classes 2026 §Class 3 covers collusion in detail. This page covers detection and response.
Cascade detection (ASI08)
Observable symptoms
Per OWASP ASI08 (Dec 2025) and Adversa’s ASI08 implementation guide (Jan 2026), cascade-failure symptoms cluster around six categories:
| Pattern | Signal |
|---|---|
| Rapid fan-out | Tool-call rate across the mesh spikes faster than the linear sum of per-agent rates |
| Cross-domain spread | An action taken in tenant A produces an action against tenant B’s data in <60 seconds |
| Oscillating retries | Agent X retries against agent Y in a loop, where neither’s per-call retry policy alone explains the rate |
| Downstream queue storms | A queue between two agents grows without bound while upstream throughput is steady |
| Repeated identical intents | Same goal/task hash recurs across agents that don’t normally coordinate |
| Cross-agent feedback loops | A’s output feeds B feeds C feeds A; either entropy of the message stream collapses or amplitude grows |
Adversa lists categories, not thresholds
The Adversa guide names what to watch but provides no concrete numeric thresholds. The wiki should not invent rates. An L3+ org must establish its own thresholds from a baseline period (e.g., 30-day rolling p99 + 3σ) and tag rules with
ASI08at minimum.
Academic detection primitives
The 2025–2026 academic literature has converged on three approaches:
| Approach | Reference | Mechanism |
|---|---|---|
| Graph-walk anomaly | SentinelAgent (arXiv:2505.24201) | Models multi-agent execution as a dynamic graph; detects anomalies at node, edge, and path levels; covers prompt injection, multi-agent collusion, latent exploit paths; pluggable runtime monitor with policy-based intervention |
| Provenance-based | TraceAegis (arXiv:2510.11203) | Abstracts execution paths into hierarchical “stable execution units”; deviation from the unit = anomaly |
| Theme-based | Bi-Level GAD (arXiv:2512.18733) | Bi-level encoder over sentence- and token-level; flags agents drifting from the dialogue’s central theme — directly applicable to “agent A and B don’t normally talk” anomalies |
Productization status: none of these ship as enterprise products as of mid-2026. They are the reference primitives the wiki’s CMM L3+ ladder is designed to be ready for.
Vendor-side primitives that exist today
| Vendor / project | Primitive | What it covers |
|---|---|---|
| Oktsec (github.com/oktsec/oktsec, v0.15.2) | Per-agent sliding-window throttling + default-deny ACLs + Ed25519 message signing + tamper-evident audit chain v2 + 268 detection rules | Containment primitives — does not surface a “cascade detection” rollup |
| Aguara (aguarascan.com) | Public ASI-to-detection-rule mapping including ASI08 rules | Mapping is published; rule SQL/YAML is not |
| LangSmith / Langtrace / Traceloop | Cross-agent OTel gen_ai.* tracing | Observability layer that cascade rules can be written against; rules themselves not provided |
The honest read: L1 (per-agent rate limits + immutable logs) is shippable today; L2 (pairwise/triadic traffic baselines + ACL default-deny) is shippable; L3+ requires research-prototype-level work or DIY.
Multi-agent behavioral baselines
Single-agent baselines are per-agent: tool-call distributions, latency profiles, output-length histograms. Multi-agent baselines need three additional shapes:
1. Aggregate-level invariants
Mesh-wide invariants that should hold regardless of which agent is responsible for satisfying them:
- Total tool-call rate across the mesh ≤ X (capacity invariant)
- No agent’s outbound message rate exceeds Y per agent-pair (rate-pair invariant)
- Total cross-tenant data-flow rate ≤ Z (compartmentalization invariant)
- p99 inter-agent latency stays bounded under load (resource-exhaustion invariant)
2. Pairwise / triadic traffic baselines
Most agent pairs in a mesh never normally communicate. The graph of “who-talks-to-whom” is sparse. Baselining at the pair (or triadic, for relay attacks) level surfaces:
- New edges in the agent communication graph (pair never seen before)
- Edge intensity bursts (pair seen rarely; sudden burst)
- Triadic pattern anomalies (A→B→C chains that never existed before)
The bi-level theme detector (Bi-Level GAD) is the academic primitive that operationalizes this for content-aware detection — “agents drifting from the dialogue’s central theme” maps to the pair / triadic departure.
3. Cross-agent drift correlation
Single-agent drift detection alerts on per-agent baseline deviation. Cross-agent drift correlation looks for agents drifting in lockstep — a signal that they are responding to a shared upstream contamination (memory poisoning, RAG poisoning, model-version regression) rather than independent variation. This is the load-bearing primitive for catching the Threat Class 4 (model-version-degradation) and the slow-poisoning sub-case of Class 2 (APT campaigns).
Inter-agent incident response
The literature is heavy on “LLM agents doing IR” (CyberSleuth, MASIR, AIR) and light on “IR for compromised LLM-agent meshes.” This is a real research gap; the wiki documents the doctrine that follows from first principles plus the available primitives.
Containment doctrine — isolate vs stop-mesh
A first-principles decision tree for the on-call IR engineer:
flowchart TD Start([Cascade alert fires]) --> Q1{Is the propagation rate<br/>doubling per minute?} Q1 -- yes --> Stop[STOP-MESH:<br/>halt all agent inter-comm via gateway] Q1 -- no --> Q2{Can blast radius<br/>be bounded by isolation<br/>of N agents?} Q2 -- yes --> Iso[ISOLATE:<br/>quarantine N agents via ACL deny;<br/>traffic shadow-recorded for forensics] Q2 -- no --> Stop Iso --> Q3{After 5 min, is propagation<br/>still active outside the quarantined set?} Q3 -- yes --> Stop Q3 -- no --> Forensics[Begin cross-agent forensics] Stop --> Forensics Forensics --> Recovery[Selective rollback /<br/>rolling restart /<br/>mesh-wide quarantine]
Default fail-mode is stop-mesh. Cost of a stopped mesh is hours of degraded service; cost of an unbounded cascade is data exfiltration, regulatory exposure, and headline incident. CSA MAESTRO’s Layer-7 framing supports this default; OWASP ASI08 mitigations name “kill switches” without specifying granularity.
Cross-agent forensics
Single-agent forensics traces one tool-call chain through one agent’s hooks and OTel spans. Cross-agent forensics requires:
- Causal correlation across agent boundaries. OTel
gen_ai.*spans must propagate trace IDs across A2A messages. Today this is implementation-specific; A2A v1.0 does not mandate it. - Tamper-evident audit chain across agents. Oktsec’s audit chain v2 is a reference primitive — single-vendor; no cross-vendor standard.
- Reconstruction of the message-graph at incident time. Graph state must be queryable retroactively; LangSmith and Langtrace cover this for their stacks; cross-stack correlation is unsolved.
- Provenance for content (RAG documents, prompts, tool definitions) referenced in the incident — so the IR can determine whether the cascade started from poisoned content vs adversarial input vs operator error.
Recovery
Three recovery shapes, in increasing severity:
| Shape | When | Mechanism |
|---|---|---|
| Selective rollback | One agent compromised; others healthy | Rotate that agent’s identity; rebuild from clean state; preserve audit |
| Rolling restart | Cascade contained; healthy fraction known | Restart agents in dependency order; verify per-agent invariants between batches |
| Mesh-wide quarantine | Cascade exceeded containment | All agents offline; full forensic snapshot; clean rebuild from version-pinned artifacts; release to canary subset before full mesh |
Maturity ladder for multi-agent runtime security
Honest reflection of the gap between today’s primitives and full coverage:
| Tier | What’s in place |
|---|---|
| L1 | Per-agent rate limits + immutable per-agent logs (Oktsec-class today). Mesh has rate primitives but no aggregate awareness |
| L2 | Pairwise/triadic traffic baselines + ACL default-deny + cross-agent OTel tracing. Mesh has pair-level visibility |
| L3 | Graph-walk anomaly detection (SentinelAgent-class, prototype) + cross-agent drift correlation + documented stop-mesh-vs-isolate doctrine + ID-tagged ASI07/ASI08 evidence per detection |
| L4 | Activation-level collusion probes (NARCBench-class, research) + automated containment doctrine + multi-agent forensics with causal correlation + multi-tool red-team eval covering ASI07/ASI08/ASI10 |
| L5 | Formally specified mesh invariants + automated cross-agent IR with selective recovery + standards contributions (OWASP, CoSAI, A2A spec). Does not exist in production as of mid-2026 — this is the research target |
This ladder integrates with the wiki’s CMM D7 L3+ evidence requirements. L3 is where pairwise baselines + stop-mesh doctrine become assessable; L4 is where multi-tool red-team eval covering multi-agent surface becomes assessable; L5 remains aspirational.
Open issues
Where the field is thin
- No vendor cascade-detection rules with concrete numeric thresholds. Adversa and OWASP describe categories; rule SQL/YAML is not public.
- No published post-mortems on production multi-agent cascade incidents as of mid-2026 — the empirical anchor is missing.
- Containment doctrine (isolate-one vs stop-mesh) is unresolved in literature. The IR papers describe specialist agent teams doing IR, not IR doctrine for meshes.
- Cross-agent log correlation lacks a standard schema. OpenTelemetry
gen_ai.*is the closest; cross-vendor correlation is implementation-specific.- MITRE ATLAS multi-agent TTPs are thin compared to the prompt-injection coverage. ASI07/08/10 anchors exist; technique-level inventory does not.
Mapping to RA + CMM
| Concern | RA plane | CMM domain |
|---|---|---|
| Inter-agent message security | Egress (PEP) | D5 Egress & Network |
| Cascade detection | Observability (PIP feeding PDP) | D7 Observability & Detection |
| Multi-agent behavioral baselines | Observability | D7 |
| Containment doctrine | Control (PDP + PAP) | D3 Control & Least-Agency |
| Inter-agent IR runbook | Cross-cutting | D9 Operations & Human Factors |
See Also
- A2A Protocol — the inter-agent transport this page assumes
- Agent Observability — single-agent counterpart; this page is the multi-agent extension
- Agentic AI Threat Classes 2026 §Class 3 (Collusion) — primary threat anchor
- CSA MAESTRO — Layer-7 multi-agent threat model
- Guardian Agent Metagovernance — when the supervisor is itself an agent in the mesh
- Apollo Research — single-model scheming detection (the detection counterparts to multi-agent collusion are by other academic groups; see Class 3 of the threat-classes page for the citation chain)