Agentic AI Security Reference Architecture (2026)
A vendor-neutral, layered reference architecture (RA) for securing agentic AI applications. Designed to cover all common deployment shapes — web/desktop chatbots, generative coding tools, data-science copilots, RAG systems, MCP servers, agent skills, and multi-agent meshes — with a single shared trust model.
block-beta columns 2 User(["Human user"]):2 Identity["IDENTITY PLANE"]:2 Control["CONTROL PLANE"]:2 Runtime["RUNTIME PLANE"]:2 Egress["EGRESS PLANE"] Data["DATA PLANE"] Obs["OBSERVABILITY PLANE"]:2 classDef pip fill:#cfe2ff,stroke:#0d6efd,color:#000 classDef pdp fill:#fff3cd,stroke:#fd7e14,color:#000 classDef pep fill:#f8d7da,stroke:#dc3545,color:#000 classDef mixed fill:#e2d5f3,stroke:#6f42c1,color:#000 classDef user fill:#d1e7dd,stroke:#198754,color:#000 class User user class Identity pip class Control pdp class Runtime pep class Egress pep class Data mixed class Obs pip
What this RA delivers
The architecture below is the implementation surface of an oversight layer — the system that monitors, evaluates, and intervenes on AI agent behavior in production. The six planes decompose the layer into the four roles codified in NIST SP 800-162 §2.2: PDPs (Policy Decision Points, concentrated in Control), PEPs (Policy Enforcement Points, distributed across Runtime / Egress / Data), PIPs (Policy Information Points, spanning Identity / Data / Observability), and PAP (Policy Administration Point, cross-cutting policy lifecycle). The Sentinels and Operatives split (PIPs feed PDP+PEP) is the runtime decomposition; the goal is verified accountable autonomy — agents that can act on their own, but where every action is verifiable, auditable, and bounded by enforced policy.
In procurement language, this layer is what Gartner calls a guardian agent. The wiki uses the architectural primary (oversight layer / PDP+PEP) when discussing components and the procurement synonym (guardian agent) when discussing vendor categories. Both describe the same role at different levels of abstraction; see Oversight Layer (PDP + PEP for Agentic AI) §Cross-walk for the full term comparison.
Design principles
Five principles anchor the architecture, drawn from Q1 2026 incident data and practitioner consensus.
- Platform-level enforcement, not prompt-level. Every control that matters runs in the runtime/platform, below the model. Prompt-level guardrails are bypassable by definition. (Consensus across AI Security Standards in Q1 2026: Agentic Threats Outpace Frameworks, Emerging Cybersecurity Practices for Agentic AI Applications, Securing the Autonomous Future: Trust, Safety, and Reliability of Agentic AI.)
- Least agency, not just least privilege. Autonomy is a governable dimension distinct from agency — agency is the scope of permitted actions; autonomy is the degree of independent decision-making within that scope (definitions per the AWS Agentic AI Security Scoping Matrix). An agent’s allowed tier (auto / notify / confirm / block) is decided per-action, not per-agent. (OWASP, Least Agency Principle.)
- Verifiable identity for every actor. Every agent has a cryptographic identity that traces back to a human. Action-to-identity binding is the foundation for audit, revocation, and compliance. (AI Agent Identity Architecture, NIST CAISI Concept Paper, Feb 2026.)
- No agent owns credentials. The credential proxy pattern is load-bearing: 5+ tools converged on it independently. Even successful prompt injection cannot extract credentials that never enter context. (Credential Proxy Pattern for AI Agents.)
- The “Lethal Trifecta” is a structural test. Any deployment combining private data + untrusted content + external comms is unconditionally vulnerable. The architecture must break the trifecta at the platform layer. (Lethal Trifecta, Simon Willison.)
Competing-view callouts on principles 1 and 5
See Wiki Novelty and Counter-Arguments §Thesis 1 (platform vs prompt) and §Thesis 3 (Lethal Trifecta).
Principle 1: the framing is hierarchy, not exclusivity. Prompt-layer guardrails reduce ASR materially (LlamaFirewall on AgentDojo: 17.6%→7.5%; Constitutional Classifiers: 86%→4.4% jailbreak success). Platform-layer is primary because not bypassable by injection; prompt-layer is residual-risk reduction. RAG hardening + system-prompt architecture pages already carry residual-risk callouts in this spirit.
Principle 5: “unconditionally vulnerable” is design-time pedagogy. In production, Stripe’s containment architecture runs trifecta agents and reports 1.5–6.7% ASR depending on model — probabilistically exploitable, not unconditional. The Lethal Trifecta is necessary for natural-language exfil at scale and sufficient given current defense maturity to require platform-layer containment. Containment can drive ASR very low but not zero — and very-low-but-not-zero is unacceptable for high-risk-tier actions.
The six planes
The architecture decomposes into six logical planes. A plane is a logical separation of concerns; multiple planes may be implemented by a single product, but the controls must be addressable independently.
Plane order reflects action flow (User → Identity → Control → Runtime → Egress / Data). Each plane is annotated with its XACML role (PIP / PDP / PEP / PAP). Observability spans the bottom as a cross-cutting plane consuming signals from all five above.
Implementation type legend
Each plane table includes a Type column classifying each reference implementation. Abbreviations: OSS = open-source software; COTS = commercial off-the-shelf (vendor product / SaaS); Std = formally governed standard or specification (IETF, CNCF, OWASP, NIST, etc.); Infra = generic infrastructure primitive (cloud VPC, Docker networking); Research = academic prototype with no shipped production implementation; Concept = architectural concept without a specific canonical implementation yet; Exploratory = forward-looking prototype or ecosystem project (e.g., OpenClaw) whose purpose is to project where agentic security is heading — not validated for production use; treat as emerging indicators, not foundational controls. Many rows combine types (e.g., “OSS + COTS”) because the capability has both free and commercial implementations in common use.
block-beta columns 2 User(["Human user"]):2 Identity["IDENTITY PLANE · PIP-side<br/>Workload identity · Agent lifecycle · NHI governance · Credential proxy"]:2 Control["CONTROL PLANE · PDP + PAP<br/>Policy evaluation · Capability tokens · Least-agency tiers · HITL"]:2 Runtime["RUNTIME PLANE · PEP (in-process)<br/>Lifecycle hooks · Input filtering · CoT auditing · Code scanning · Sandboxing"]:2 Egress["EGRESS PLANE · PEP (broker)<br/>Agent/MCP proxy · Tool authorization · Tool integrity · Egress filtering"] Data["DATA PLANE · PIP + PEP<br/>AI-BOM · RAG provenance · Memory integrity · State rollback · Supply-chain scanning"] Obs["OBSERVABILITY PLANE · PIP (cross-cutting)<br/>Distributed tracing · Behavioral monitoring · AI-SPM · Red-team integration"]:2 classDef pip fill:#cfe2ff,stroke:#0d6efd,color:#000 classDef pdp fill:#fff3cd,stroke:#fd7e14,color:#000 classDef pep fill:#f8d7da,stroke:#dc3545,color:#000 classDef mixed fill:#e2d5f3,stroke:#6f42c1,color:#000 classDef user fill:#d1e7dd,stroke:#198754,color:#000 class User user class Identity pip class Control pdp class Runtime pep class Egress pep class Data mixed class Obs pip
1. Identity plane
Verifiable per-agent identity bound to a human principal — covering workload identity, agent and Non-Human Identity (NHI) lifecycle governance, the credential-proxy primitive, action-to-identity tracing, and OAuth 2.1 / OIDC delegation extensions for agents.
| Capability | Reference implementation | Type | Status |
|---|---|---|---|
| Workload identity | SPIRE | OSS + Std | Mature |
| Agent identity & lifecycle | Okta for AI Agents (GA Apr 30, 2026), Microsoft Entra Agent ID + Agent 365 Registry (GA May 1, 2026) | COTS | Mature (commercial) |
| Non-Human Identity governance | Aembit, Astrix Security, CyberArk Conjur, Okta NHI, Oasis Security | COTS | Developing |
| Credential proxy | AgentKeys, Keychains.dev, Aegis (local), OneCLI, AgentSecrets | OSS / COTS | Developing — 5-tool convergence |
| Action-to-identity tracing | Microsoft Agent 365, Anthropic Compliance API (Mar 24, 2026) | COTS | Mature (vendor-stack-locked) |
| OAuth 2.1 / OIDC for agents | Standard OIDC + NIST CAISI Concept Paper extensions (Feb 2026) | Std | Developing |
Key insight: the credential proxy (Credential Proxy Pattern for AI Agents) is the single most important control here. Five different OSS/commercial tools converged on the same architecture independently — a strong signal that it is load-bearing.
2. Control plane
Policy adjudication and capability-token issuance — Cedar / OPA Policy Decision Point evaluation, task-scoped Warrants, the OWASP four-tier least-agency engine, HITL gating on high-impact actions, and CSA ATF risk-based step-up — adjudicated before an agent’s tool call reaches the runtime.
| Capability | Reference implementation | Type | Status |
|---|---|---|---|
| Policy language | Cedar (AWS, March 2026 release for AI governance), Rego | OSS | Mature |
| Capability tokens / Warrants | Tenuo Warrants — task-scoped, signed, ephemeral, holder-bound capability authorizations | OSS | Developing |
| Least-agency tier engine | OWASP four-tier (auto / notify / confirm / block) | Std | Conceptual; needs reference implementation |
| Tool annotation enforcement | Anthropic tool annotations, OpenAI function-call schemas | COTS | Mature |
| HITL primitive | Human-in-the-loop confirmation gate before high-impact tool calls | Concept | Developing |
| Risk-based step-up | CSA Agentic Trust Framework (Feb 2026) — 5 progressive autonomy gates | Std | Developing |
The control plane is where the Lethal Trifecta is broken. If a tool call combines private-data scope + untrusted-content provenance + external-comms reach, the PDP either downgrades to a safer tier (notify or confirm) or denies the call.
3. Runtime plane
Runtime defenses around the agent process — lifecycle hooks (Google ADK, Anthropic), input filtering and prompt-injection detection, chain-of-thought alignment auditing, code-output static analysis, content-safety classification, per-task sandboxing, and (research-stage) compartmentalized privileged / quarantined LLM split.
| Capability | Reference implementation | Type | Status |
|---|---|---|---|
| Lifecycle hooks | Google ADK before_model_callback / before_tool_callback; Anthropic tool-use hooks | OSS + COTS | Mature |
| Input filtering / prompt-injection detection | LlamaFirewall PromptGuard 2 (97.5% recall, 1% FPR), NVIDIA NeMo Jailbreak Detection NIM, Palo Alto Prisma AIRS, Onyx Platform runtime protection | OSS + COTS | Mature |
| Chain-of-thought alignment auditing | LlamaFirewall AlignmentCheck | OSS | Developing — novel primitive |
| Code-output static analysis | LlamaFirewall CodeShield, GitHub Copilot for Security | OSS + COTS | Developing |
| Topic / content safety | NVIDIA NeMo Content Safety NIM, Lakera Guard, Lasso, Microsoft Prompt Shields | COTS | Mature |
| Sandbox / containment | Firecracker (per-task VM), gVisor container, WebAssembly sandbox | OSS | Mature |
| Compartmentalized LLMs (CaMeL pattern) | Privileged + Quarantined LLM split (Google DeepMind) | Research | Research-stage |
| Proof-of-Guardrail attestation | AWS Nitro Enclaves + Miggo Security | COTS | Research-stage |
Note on bypass risk. The Trendyol Tech LlamaFirewall bypass case (non-English / leetspeak prompt injections) confirms that no single guardrail is sufficient. Defense-in-depth across all six planes is required.
4. Egress plane
Network-layer mediation of agent reach to tools, MCP servers, peer agents, and the open internet — gateway-based MCP / A2A / LLM proxying, MCP runtime authorization, tool-poisoning and rug-pull defense, agent-to-agent cryptographic identity, egress destination filtering, and per-agent network segmentation.
| Capability | Reference implementation | Type | Status |
|---|---|---|---|
| MCP / A2A / LLM proxy | AgentGateway (Linux Foundation, July 2025), Solo Enterprise for agentgateway | OSS + COTS | Mature (OSS) |
| MCP runtime authorization | Operant MCP Gateway, Natoma | COTS | Developing |
| rug-pull defense | Solo Enterprise tool-server fingerprinting, versioning, runtime policy | COTS | Developing |
| Agent-to-agent cryptographic identity | Oktsec (Ed25519 sigs), 175 detection rules, content scanning | COTS | Developing |
| Egress destination filtering | Credential proxy destination allowlists, Docker DOCKER-USER chain | OSS | Mature |
| Network segmentation | Per-agent subnet, agent VPC isolation | Infra | Mature |
The egress plane is where the MCP attack surface is contained: 30+ MCP CVEs in 60 days (Q1 2026); 82% of MCP servers vulnerable to path traversal; 66% to code injection. AgentGateway’s automatic, secure token exchange limits per-tool permissions to exactly what’s needed.
5. Data plane
Trust attribution and integrity for training data, retrieval corpora, knowledge bases, per-session memory, and the AI Bill of Materials — covering AI/ML-BOM generation and runtime reconciliation, RAG provenance and attestation, memory-poisoning defense, cognitive file integrity over agent identity files, state rollback, and supply-chain scanning.
OpenClaw ecosystem tools in this plane
Several rows below (RAGShield, TrustRAG, SecureClaw / cognitive file integrity, Brain Git, Aguara Watch) originate from the OpenClaw ecosystem — a body of work whose purpose was to project where agentic data-plane security is heading, not to provide battle-tested controls. They are retained because they correctly identify real capability gaps and likely future directions. However, they are classified Exploratory and should not be treated as foundational evidence on par with OWASP AIBOM, sigstore, or Miggo Security. Treat them as emerging indicators and validate independently before production deployment.
| Capability | Reference implementation | Type | Status |
|---|---|---|---|
| AI/ML-BOM generation | OWASP AIBOM Generator (CycloneDX 1.6), SPDX 3.0 AI extensions, IBM Granite 4.0 disclosures | OSS + Std | Developing |
| Runtime AI-BOM | Miggo Security DeepTracing (behavioral baseline) | COTS | Developing — novel |
| RAG provenance / attestation | RAGShield (cryptographic doc attestation), TrustRAG | Exploratory | Exploratory — OpenClaw ecosystem; not production-validated |
| Memory poisoning defense | Microsoft Defender for Cloud Apps memory-injection detector (50+ examples found, March 2026) | COTS | Developing |
| Cognitive file integrity | SHA-256 monitoring of SOUL.md, IDENTITY.md; SecureClaw | Exploratory | Exploratory — OpenClaw ecosystem; emerging indicator, not foundational control |
| State rollback | Brain Git (SlowMist) | Exploratory | Exploratory — OpenClaw ecosystem; no production adoption evidence |
| Skill / model registry signing | sigstore / cosign, OWASP AIBOM | OSS | Developing |
| Supply-chain scanning | JFrog ML scan, ReversingLabs | COTS | Developing |
| Supply-chain scanning (emerging) | Aguara Watch (5 registries daily, SlowMist) | Exploratory | Exploratory — OpenClaw ecosystem; forward indicator for registry hygiene direction |
The data plane is the layer that took the most damage in Q1 2026: ClawHavoc (1,184 malicious skills), SANDWORM_MODE (npm worm into MCP), LiteLLM compromise. Defense requires registry → pre-install → checksum → cognitive-file integrity layered together.
6. Observability plane
Glass-box visibility across the five upstream planes — OpenTelemetry gen_ai.* semantic conventions, agent-aware tracing, AI-SPM, agent behavioral monitoring, identity multiplexing in logs, SIEM / SOAR with agent playbooks, and AI red-teaming integration as a continuous feed rather than a point-in-time exercise.
| Capability | Reference implementation | Type | Status |
|---|---|---|---|
OpenTelemetry gen_ai.* semantic conventions | OTel SemConv v1.37+ (CNCF standard; SIG contributors: Amazon, Elastic, Google, IBM, Langtrace, Microsoft, OpenLIT, Scorecard, Traceloop) | Std | Mature (experimental status, broad adoption) |
| Agent-aware tracing | LangSmith, Langtrace, Traceloop, Helicone | OSS + COTS | Mature |
| AI-SPM (Security Posture Management) | Wiz AI-SPM, Palo Alto Prisma AIRS, Orca AI-SPM, Reco, Onyx Platform | COTS | Mature |
| Agent behavioral monitoring (anomaly detection on agent activity) | Vectra AI, Miggo behavioral drift | COTS | Developing |
| Agent behavioral monitoring (emerging) | SecureClaw nightly audits | Exploratory | Exploratory — OpenClaw ecosystem; forward indicator for agent audit direction |
| Identity multiplexing in logs | Agent Observability §3 | Concept | Developing |
| SIEM / SOAR with agent playbooks | Splunk + agent IOCs, Sentinel + Defender for Cloud, Falcon AIDR + NeMo Guardrails | COTS | Developing |
| AI red-teaming integration | Promptfoo, Mindgard CART, PyRIT (Microsoft), Garak (NVIDIA), Palo Alto Prisma AIRS (continuous CART) | OSS + COTS | Mature |
Key stat: agents generate 10–20× the log volume of humans over the same time window. Observability scaling is its own problem; pre-aggregation at the runtime hook is necessary, not optional.
Mapping to deployment shapes
The architecture is invariant across deployment shapes; the populated controls differ. The table below specifies which planes are exercised, and which controls are load-bearing, for each common shape.
| Deployment shape | Identity | Control | Runtime | Egress | Data | Observability |
|---|---|---|---|---|---|---|
| Web/desktop chatbot (no tools) | OIDC for human; agent runs as bot identity | Topic-control, content-safety policies | NeMo Content Safety, Lakera Guard | None (no tools) | RAG corpus only | OTel gen_ai.* |
| Generative coding tool (Copilot, Cursor, Claude Code) | Workspace identity + per-repo scope; agent rules files (.cursorrules / IDENTITY.md) baselined | Pre-commit hooks, Cedar policy on file mutation; destructive-action classification routes force-push / mass refactor / prod-config writes to confirm or block | LlamaFirewall CodeShield + AlignmentCheck; per-task sandbox; rogue IDE extension detection (Kirin / equivalent) | Source-control + LSP only; typosquat-aware dependency install gate | Repo content + skill registry; cognitive file integrity for rules files AND IDENTITY.md | LangSmith / Langtrace; tool-call audit; MCP usage + rule changes + policy-violations dashboard (Kirin-class) |
| Data-science copilot (Jupyter, notebooks) | User identity passthrough | Sandbox-resource policy; HITL on dataset writes | Per-notebook sandbox (Firecracker) | Internal data warehouse via credential proxy | Dataset lineage + AI-BOM | OTel + behavioral drift |
| RAG application | Per-tenant agent identity | Source-trust attribution; lethal-trifecta breaker | Input filter on prompts; output classifier | Vector store + per-source allowlist | RAGShield/TrustRAG; document attestation; PoisonedRAG defense | Retrieval-pattern behavioral monitoring |
| MCP server (consumed by agents) | mTLS + workload identity (SPIFFE) | OAuth 2.1 token exchange (CoSAI / NIST CAISI) | Tool-fingerprinting, rug-pull detection (Solo Enterprise) | n/a (server-side) | Skill/server signing, version pinning | MCP CVE feed integration |
| Agent skill (e.g., Claude skill, Anthropic plugin) | Skill author identity (signed) | Skill-scoped Cedar policy | Skill manifest validation; runtime sandbox | Per-skill egress allowlist | Skill registry signing (sigstore); cognitive file integrity | Skill execution telemetry |
| Multi-agent mesh (A2A v1.0) | Per-agent Ed25519 identity (Oktsec-side; not in spec); SPIFFE/SPIRE workload identity | Cross-agent ACL (Oktsec default-deny); A2A opacity principle; per-skill authorization advertised in Agent Card; stop-mesh-vs-isolate containment doctrine (Multi-Agent Runtime Security) | Per-agent runtime sandbox; AgentGateway broker between | A2A v1.0 over HTTPS / TLS 1.3 + signed Agent Cards (§8.4) + content scanning (Oktsec 268 rules); replay protection layered by impl (timestamps + nonces) | Shared memory / blackboard with provenance; cross-agent OTel gen_ai.* trace propagation | Pairwise/triadic traffic baselines; graph-walk anomaly detection (SentinelAgent / TraceAegis-class, prototype); cross-agent drift correlation; cascade detection per ASI08 doctrine |
Recommended stacks by org profile
Opinionated tool selections per plane for two common org profiles. Neither stack is exhaustive — treat each as a starting point extensible per deployment shape and risk profile.
Enterprise stack (COTS-heavy)
Suited for large organizations with existing vendor relationships, centralized IAM, and SOC/SIEM infrastructure. Prioritizes vendor SLAs, support contracts, and integrations with Microsoft 365 / AWS / GCP environments.
| Plane | Primary choices | Notes |
|---|---|---|
| Identity | Okta for AI Agents (GA Apr 30, 2026) or Microsoft Entra Agent ID + Microsoft Agent 365 Registry; CyberArk Conjur or Aembit for NHI governance | Choose based on existing IAM vendor; both support OAuth 2.1 delegation |
| Control | AWS Cedar managed policy service (March 2026 AI governance release); Anthropic Compliance API or Microsoft Agent Governance Toolkit (Apr 2026); Permit.io for RBAC UI | Cedar is the enterprise-grade choice; COTS wrappers add audit + workflow tooling |
| Runtime | LlamaFirewall (Meta OSS — no license cost) + NVIDIA NeMo NIMs (commercial inference); Microsoft Prompt Shields for content safety; per-task Firecracker VM or Hyper-V sandbox | Mix OSS guardrail (LlamaFirewall) with COTS NIM delivery for SLA coverage |
| Egress | Solo Enterprise for AgentGateway, Kong AI Gateway, or Cloudflare AI Gateway; Operant MCP Gateway for MCP-specific authorization; mTLS via Istio / Linkerd | Enterprise distributions of OSS gateways; prefer one with A2A v1.0 support |
| Data | Microsoft Purview AI (M365 environments); Wiz AI-SPM or Palo Alto Prisma AIRS; JFrog ML Catalog for AI-BOM; ReversingLabs for supply-chain scanning | Stack assumes M365 + cloud environment; swap Purview for CASB equivalent if GCP/AWS-native |
| Observability | DataDog AI Monitoring or New Relic AI Monitoring (OTel-native); LangSmith for agent-specific tracing; Mindgard CART for continuous red-teaming; Vectra AI or Palo Alto Cortex XSIAM for behavioral monitoring | OTel gen_ai.* spans feed into existing SIEM; Mindgard replaces point-in-time red-team for CARTS programs |
FOSS / small-team stack
Suited for research teams, security teams running internal agent experiments, startups, or orgs with open-source mandates. Prioritizes zero licensing cost, community support, and composability. Requires more operational ownership.
| Plane | Primary choices | Notes |
|---|---|---|
| Identity | SPIRE for workload identity; standard OAuth 2.1 + OIDC (any provider) for delegation; Aegis or AgentKeys for credential proxy | SPIFFE is the vendor-neutral workload-identity standard; pairs with any OIDC-compatible IdP |
| Control | Rego or Cedar (open-source distribution); Tenuo Warrants (Rust OSS) for task-scoped capability tokens; OWASP least-agency four-tier guidance for tier design | Both Cedar and OPA are fully OSS; Tenuo adds cryptographic delegation without a license |
| Runtime | LlamaFirewall PromptGuard 2 + AlignmentCheck + CodeShield (Meta OSS); NVIDIA NeMo Guardrails (OSS portion); Firecracker or gVisor for per-task sandboxing | Full LlamaFirewall stack is zero-cost; Firecracker is AWS OSS (Apache 2.0); gVisor is Google OSS |
| Egress | AgentGateway (Linux Foundation, Apache 2.0); mTLS via Istio (CNCF OSS) or Linkerd (CNCF OSS); Docker DOCKER-USER iptables chain for egress filtering | AgentGateway is the canonical OSS agent proxy; Istio adds mTLS with near-zero operational overhead vs DIY |
| Data | OWASP AIBOM Generator (CycloneDX 1.6, OSS); sigstore / cosign for artifact signing; RAGShield or TrustRAG for RAG attestation; Brain Git (SlowMist) for state rollback | All zero-cost; RAGShield/TrustRAG are research-grade — treat as experimental until production-validated |
| Observability | OpenTelemetry gen_ai.* SemConv (CNCF standard, v1.37+); Langtrace or Traceloop (OSS) for agent-aware tracing; PyRIT (Microsoft OSS) + Garak (NVIDIA OSS) + Promptfoo (OSS) for red-team coverage | OTel is the zero-cost observability foundation; the three red-team tools cover orchestration / probe / regression — all open-source |
Stack evolution
The FOSS stack is production-viable at D3–D4 levels of the CMM for most deployment shapes. The Research-type items (CaMeL, RAGShield at scale) are not yet production-ready for either stack. The enterprise stack exceeds the FOSS stack primarily in operational overhead reduction (vendor support, pre-built integrations) rather than raw security capability — at L4 CMM, a well-operated FOSS stack achieves comparable controls.
Threat-control matrix (OWASP Agentic AI Top 10 → planes)
Mapping of OWASP Agentic AI Top 10 (ASI01–ASI10) risk categories to the planes that primarily mitigate them. Most categories have controls in multiple planes; the table that follows the diagram identifies the primary control surface and lists the reference controls for each.
flowchart LR subgraph Threats[Threats] ASI01[ASI01: Goal Hijack] ASI02[ASI02: Tool Misuse] ASI03[ASI03: Identity & Privilege] ASI04[ASI04: Supply Chain] ASI05[ASI05: Data Disclosure] ASI06[ASI06: Memory Poisoning] ASI07[ASI07: Inter-Agent Comms] ASI08[ASI08: Cascading Failures] ASI09[ASI09: Missing Guardrails] ASI10[ASI10: Rogue Agents] end subgraph Planes[Planes] ID[Identity] CTL[Control] RT[Runtime] EG[Egress] DT[Data] OBS[Observability] end ASI01 --> RT & CTL ASI02 --> CTL & EG ASI03 --> ID ASI04 --> DT ASI05 --> ID & EG ASI06 --> DT ASI07 --> EG ASI08 --> CTL & OBS ASI09 --> RT & CTL ASI10 --> ID & OBS
| OWASP ASI | Primary plane | Reference controls |
|---|---|---|
| ASI01 Goal Hijack | Runtime + Control | LlamaFirewall AlignmentCheck; HITL on goal-changing actions |
| ASI02 Tool Misuse | Control + Egress | Cedar/OPA tool-call policy; AgentGateway runtime authz |
| ASI03 Identity & Privilege | Identity | Okta for AI Agents; Microsoft Entra Agent ID; credential proxy |
| ASI04 Supply Chain | Data | OWASP AIBOM Generator; sigstore; Aguara Watch |
| ASI05 Data Disclosure | Identity + Egress | Credential proxy; egress filtering; Microsoft Purview |
| ASI06 Memory Poisoning | Data | RAGShield; cognitive file integrity; M365 memory-injection detector |
| ASI07 Insecure Inter-Agent | Egress | A2A v1.0 over HTTPS + signed Agent Cards (§8.4); Oktsec Ed25519 message signing + content scanning (268 rules) |
| ASI08 Cascading Failures | Control + Observability | Step-up gates (CSA ATF); pairwise/triadic baselines; graph-walk anomaly detection; stop-mesh-vs-isolate doctrine (Multi-Agent Runtime Security) |
| ASI09 Missing Guardrails | Runtime + Control | LlamaFirewall + NeMo Guardrails; HITL primitive |
| ASI10 Rogue Agents | Identity + Observability | Behavioral drift detection; Okta Agent Discovery; kill switch |
Trade-offs
Architectural trade-offs that vary with deployment scale, latency tolerance, and risk profile. Each row names the decision axis and the recommended default; deviations should be documented with a strategic-rationale field per the CMM reporting convention.
- Single broker vs mesh. AgentGateway-as-broker is simpler and gives a chokepoint for policy. Mesh (per-service AgentGateway sidecar) scales better but multiplies the policy surface. Default to broker for ≤50 agents; move to mesh above that.
- PDP location. Inline (in-process with the agent) gives the best latency but couples policy with runtime. Sidecar gives clean separation. External service is the standard zero-trust answer but adds 5–20ms per call. Default to sidecar.
- Sandbox grain. Per-call sandbox is safest but expensive; per-task sandbox is the practical default; per-agent sandbox is too coarse for high-risk-tier actions.
- Fail-closed vs fail-open. Default to fail-closed for high-risk-tier actions, fail-open for read-only / informational tier. CSA ATF Promotion Gates encode this directly.
Gaps in the architecture
Known unfilled spots
- Compartmentalized LLM (CaMeL) reference pattern. Privileged-LLM-coordinates-quarantined-LLM is theoretically sound but lacks a vendor-neutral reference implementation. (Google DeepMind research-stage.)
- Cross-tenant MCP server signing. MCP CVE rate (30+ in Q1 2026) suggests the ecosystem is pre-supply-chain-hardening. sigstore-for-MCP-servers is needed but not standardized.
- Multi-agent failure containment. ASI08 (Cascading Failures) and ASI10 (Rogue Agents) have no traditional cybersecurity equivalent. Partially addressed 2026-05-02: see Multi-Agent Runtime Security for the cascade-detection / behavioral-baseline / inter-agent IR depth. Honest read: 2026 is still the academic-prototype era — graph-walk monitors (SentinelAgent, TraceAegis) ship as papers, vendor primitives exist (Oktsec rate limits + ACLs) but no integrated cascade-detection product ships with documented thresholds.
- AI-BOM operationalization gap. CycloneDX 1.6 ML-BOM is the format; the operational workflow (CI/CD integration, vendor disclosure norm, AI-VEX equivalent) is thin.
- Identity binding when humans are decommissioned. When the human owner of an agent leaves, the agent must be rotated or revoked. Okta and Microsoft Agent 365 cover this for managed agents; orphaned shadow agents are still discoverable but not always governable.
Prior work and comparison
No vendor-neutral, control-architecture-focused reference architecture for agentic AI security existed in the public domain as of Q1 2026. Prior work falls into two categories: cloud-specific operational guidance (hyperscaler “how to secure AI on our platform” documents) and vendor-neutral threat taxonomies (threat enumeration without control specification). The table below summarizes each and the gap each leaves unaddressed.
| Framework | Published | Vendor neutral? | Agentic-specific? | What it covers | Key gap |
|---|---|---|---|---|---|
| Microsoft ZT4AI | March 2026 | No (Azure / Microsoft stack) | Strong yes — agent identity, MCP governance, multi-agent verification | 700+ controls across 116 groups; extends ZT to a 7th “AI” pillar; Entra Agent ID, Agent 365 registry, MCP three-tier governance | Azure-tooling-locked; MCP control specification and supply-chain guidance still maturing; tooling in preview/development as of May 2026 |
| Azure OpenAI Reference Architecture | Ongoing (Feb 2026 update) | No (Azure) | Partial — adds Entra Agent ID, scoped tokens, HITL via Logic Apps | 5-layer: Network isolation · Identity/access · Content/prompt · Data protection · Monitoring | Assumes static LLM API invocations; no multi-agent orchestration architecture; no MCP, A2A, or memory/state security |
| Microsoft MCRA | April 2025 | No (Microsoft ecosystem) | Minimal | Enterprise security architecture diagrams across ZT pillars; AI section = Microsoft Security Copilot as a security tool | Not a defense-of-AI architecture; treats AI as defender-assistant, not as a class of workloads to secure |
| AWS Well-Architected Generative AI Lens | November 2025 | No (AWS) | Partial — names “excessive agency” as a risk; adds an agentic AI preamble | 6 WAF pillars applied to GenAI lifecycle; security pillar covers: endpoint protection, output risk, prompt security, monitoring, model integrity | ”Excessive agency” named but mitigations unspecified; no agent identity, tool sandboxing, agent-to-agent trust, or MCP coverage |
| Google Cloud AI Security Foundations | 2025 (ongoing) | No (GCP) | Partial — 6-layer model includes “Agents and Applications” layer | 6 architectural layers: foundation → infrastructure → models → data → tools → agents; Model Armor for runtime guardrails; VPC/IAM/CMEK controls | ”Agents and Applications” layer exists as a named category but is underspecified; no agent identity governance, tool execution sandboxing, inter-agent trust, or memory security |
| CSA MAESTRO | February 2025 | Yes | Full — built exclusively for agentic AI systems | 7-layer threat taxonomy: Foundation Models → Data Operations → Agent Frameworks → Deployment/Infra → Evaluation/Observability → Security/Compliance → Agent Ecosystem | Taxonomy only — identifies threats but specifies no controls, no implementation guidance, and no compliance mapping |
What this RA adds
The six-plane model occupies the gap between cloud-specific operational guides and vendor-neutral threat taxonomies. Four contributions distinguish it:
Vendor-neutral control architecture. Unlike ZT4AI, the Azure RA, the AWS GenAI Lens, and the Google Cloud guidance, this RA specifies how to implement controls without mandating a particular hyperscaler. Reference implementations are labeled OSS / COTS / Std / Exploratory so organizations can substitute equivalent tools per plane.
Agentic-throughout. Unlike MCRA (AI = security tool) and the AWS / Google guidance (agentic = a paragraph), all six planes are designed assuming autonomous multi-step agent loops — credential proxy in the identity plane, per-task sandboxing in the runtime plane, A2A v1.0 in the egress plane, Warrant-based delegation in the control plane, memory-poisoning defense in the data plane, and behavioral-drift detection in the observability plane.
MCP-specific surface. MCP received 30+ disclosed CVEs in Q1 2026 and is absent or underspecified in the five comparison frameworks above. The egress plane explicitly addresses MCP runtime authorization, tool fingerprinting, and rug-pull defense — controls not specified in any reviewed prior work.
Tool supply chain. The data plane covers AI-BOM generation, skill / model registry signing, and supply-chain scanning. This surface is absent in the cloud-specific guides and only named (not addressed) in MAESTRO’s Agent Frameworks layer.
Inherited inputs. The RA synthesizes rather than originates; it inherits from:
- ZT4AI’s principle of per-agent scoped identity with cryptographic binding
- CSA MAESTRO’s 7-layer threat taxonomy as the threat-model input
- CSA ATF’s promotion gates as the basis for the least-agency tier model
- OWASP ASI Top 10 as the explicit control-to-threat mapping
- NIST CAISI Concept Paper’s OAuth 2.1 extensions for agent delegation
The primary contribution is the synthesis: a single vendor-neutral document connecting a threat taxonomy (MAESTRO + OWASP ASI Top 10) to a plane-by-plane control library (this RA) to a maturity model (Agentic AI Security CMM 2026) — a chain that none of the comparison frameworks provides end-to-end.
- Implemented by: Agentic AI Security Capability Maturity Model — A 2026 Practical Proposal — the CMM measures organizational maturity in operating this architecture.
- Built on: Security Controls for AI Stacks (six-layer inventory), AI Agent Identity Architecture.
- Validated against: OWASP Top 10 for Agentic Applications (ASI Top 10), MITRE ATLAS, CSA Agentic Trust Framework, CoSAI — Coalition for Secure AI.
- Compared with (§Prior Work): Microsoft ZT4AI (March 2026), Azure OpenAI Reference Architecture, Microsoft MCRA, AWS Well-Architected Generative AI Lens (Nov 2025), Google Cloud AI Security Foundations, CSA MAESTRO 7-layer model.
- Threat-anchored to: ClawHavoc — Agentic Skill Marketplace Supply Chain Attack, SANDWORM_MODE npm worm — AI Toolchain Poisoning, Meta Sev 1 AI Agent Breach, MCP CVEs Q1 2026, GTG-1002 — First Reported AI-Orchestrated Cyber Espionage Campaign.
- Stress-tested against: Agentic AI Threat Classes — 2026 Expansion — the five threat classes a peer reviewer would surface beyond the OWASP ASI / MITRE ATLAS / Lethal Trifecta baseline (insider, APT campaign, collusion, model-version regression, jurisdictional adversary).