Agentic AI Security Reference Architecture (2026)

A vendor-neutral, layered reference architecture (RA) for securing agentic AI applications. Designed to cover all common deployment shapes — web/desktop chatbots, generative coding tools, data-science copilots, RAG systems, MCP servers, agent skills, and multi-agent meshes — with a single shared trust model.

block-beta
  columns 2
  
  User(["Human user"]):2
  
  Identity["IDENTITY PLANE"]:2
  
  Control["CONTROL PLANE"]:2
  
  Runtime["RUNTIME PLANE"]:2
  
  Egress["EGRESS PLANE"]
  Data["DATA PLANE"]
  
  Obs["OBSERVABILITY PLANE"]:2

  classDef pip fill:#cfe2ff,stroke:#0d6efd,color:#000
  classDef pdp fill:#fff3cd,stroke:#fd7e14,color:#000
  classDef pep fill:#f8d7da,stroke:#dc3545,color:#000
  classDef mixed fill:#e2d5f3,stroke:#6f42c1,color:#000
  classDef user fill:#d1e7dd,stroke:#198754,color:#000
  
  class User user
  class Identity pip
  class Control pdp
  class Runtime pep
  class Egress pep
  class Data mixed
  class Obs pip

What this RA delivers

The architecture below is the implementation surface of an oversight layer — the system that monitors, evaluates, and intervenes on AI agent behavior in production. The six planes decompose the layer into the four roles codified in NIST SP 800-162 §2.2: PDPs (Policy Decision Points, concentrated in Control), PEPs (Policy Enforcement Points, distributed across Runtime / Egress / Data), PIPs (Policy Information Points, spanning Identity / Data / Observability), and PAP (Policy Administration Point, cross-cutting policy lifecycle). The Sentinels and Operatives split (PIPs feed PDP+PEP) is the runtime decomposition; the goal is verified accountable autonomy — agents that can act on their own, but where every action is verifiable, auditable, and bounded by enforced policy.

In procurement language, this layer is what Gartner calls a guardian agent. The wiki uses the architectural primary (oversight layer / PDP+PEP) when discussing components and the procurement synonym (guardian agent) when discussing vendor categories. Both describe the same role at different levels of abstraction; see Oversight Layer (PDP + PEP for Agentic AI) §Cross-walk for the full term comparison.

Design principles

Five principles anchor the architecture, drawn from Q1 2026 incident data and practitioner consensus.

Platform-level enforcement, not prompt-level. Every control that matters runs in the runtime/platform, below the model. Prompt-level guardrails are bypassable by definition. (Consensus across AI Security Standards in Q1 2026: Agentic Threats Outpace Frameworks, Emerging Cybersecurity Practices for Agentic AI Applications, Securing the Autonomous Future: Trust, Safety, and Reliability of Agentic AI.)
Least agency, not just least privilege. Autonomy is a governable dimension distinct from agency — agency is the scope of permitted actions; autonomy is the degree of independent decision-making within that scope (definitions per the AWS Agentic AI Security Scoping Matrix). An agent’s allowed tier (auto / notify / confirm / block) is decided per-action, not per-agent. (OWASP, Least Agency Principle.)
Verifiable identity for every actor. Every agent has a cryptographic identity that traces back to a human. Action-to-identity binding is the foundation for audit, revocation, and compliance. (AI Agent Identity Architecture, NIST CAISI Concept Paper, Feb 2026.)
No agent owns credentials. The credential proxy pattern is load-bearing: 5+ tools converged on it independently. Even successful prompt injection cannot extract credentials that never enter context. (Credential Proxy Pattern for AI Agents.)
The “Lethal Trifecta” is a structural test. Any deployment combining private data + untrusted content + external comms is unconditionally vulnerable. The architecture must break the trifecta at the platform layer. (Lethal Trifecta, Simon Willison.)

Competing-view callouts on principles 1 and 5

See Wiki Novelty and Counter-Arguments §Thesis 1 (platform vs prompt) and §Thesis 3 (Lethal Trifecta).

Principle 1: the framing is hierarchy, not exclusivity. Prompt-layer guardrails reduce ASR materially (LlamaFirewall on AgentDojo: 17.6%→7.5%; Constitutional Classifiers: 86%→4.4% jailbreak success). Platform-layer is primary because not bypassable by injection; prompt-layer is residual-risk reduction. RAG hardening + system-prompt architecture pages already carry residual-risk callouts in this spirit.

Principle 5: “unconditionally vulnerable” is design-time pedagogy. In production, Stripe’s containment architecture runs trifecta agents and reports 1.5–6.7% ASR depending on model — probabilistically exploitable, not unconditional. The Lethal Trifecta is necessary for natural-language exfil at scale and sufficient given current defense maturity to require platform-layer containment. Containment can drive ASR very low but not zero — and very-low-but-not-zero is unacceptable for high-risk-tier actions.

The six planes

The architecture decomposes into six logical planes. A plane is a logical separation of concerns; multiple planes may be implemented by a single product, but the controls must be addressable independently.

Plane order reflects action flow (User → Identity → Control → Runtime → Egress / Data). Each plane is annotated with its XACML role (PIP / PDP / PEP / PAP). Observability spans the bottom as a cross-cutting plane consuming signals from all five above.

Implementation type legend

Each plane table includes a Type column classifying each reference implementation. Abbreviations: OSS = open-source software; COTS = commercial off-the-shelf (vendor product / SaaS); Std = formally governed standard or specification (IETF, CNCF, OWASP, NIST, etc.); Infra = generic infrastructure primitive (cloud VPC, Docker networking); Research = academic prototype with no shipped production implementation; Concept = architectural concept without a specific canonical implementation yet; Exploratory = forward-looking prototype or ecosystem project (e.g., OpenClaw) whose purpose is to project where agentic security is heading — not validated for production use; treat as emerging indicators, not foundational controls. Many rows combine types (e.g., “OSS + COTS”) because the capability has both free and commercial implementations in common use.

block-beta
  columns 2
  
  User(["Human user"]):2
  
  Identity["IDENTITY PLANE · PIP-side<br/>Workload identity · Agent lifecycle · NHI governance · Credential proxy"]:2
  
  Control["CONTROL PLANE · PDP + PAP<br/>Policy evaluation · Capability tokens · Least-agency tiers · HITL"]:2
  
  Runtime["RUNTIME PLANE · PEP (in-process)<br/>Lifecycle hooks · Input filtering · CoT auditing · Code scanning · Sandboxing"]:2
  
  Egress["EGRESS PLANE · PEP (broker)<br/>Agent/MCP proxy · Tool authorization · Tool integrity · Egress filtering"]
  Data["DATA PLANE · PIP + PEP<br/>AI-BOM · RAG provenance · Memory integrity · State rollback · Supply-chain scanning"]
  
  Obs["OBSERVABILITY PLANE · PIP (cross-cutting)<br/>Distributed tracing · Behavioral monitoring · AI-SPM · Red-team integration"]:2

  classDef pip fill:#cfe2ff,stroke:#0d6efd,color:#000
  classDef pdp fill:#fff3cd,stroke:#fd7e14,color:#000
  classDef pep fill:#f8d7da,stroke:#dc3545,color:#000
  classDef mixed fill:#e2d5f3,stroke:#6f42c1,color:#000
  classDef user fill:#d1e7dd,stroke:#198754,color:#000
  
  class User user
  class Identity pip
  class Control pdp
  class Runtime pep
  class Egress pep
  class Data mixed
  class Obs pip

1. Identity plane

Verifiable per-agent identity bound to a human principal — covering workload identity, agent and Non-Human Identity (NHI) lifecycle governance, the credential-proxy primitive, action-to-identity tracing, and OAuth 2.1 / OIDC delegation extensions for agents.

Capability	Reference implementation	Type	Status
Workload identity	SPIRE	OSS + Std	Mature
Agent identity & lifecycle	Okta for AI Agents (GA Apr 30, 2026), Microsoft Entra Agent ID + Agent 365 Registry (GA May 1, 2026)	COTS	Mature (commercial)
Non-Human Identity governance	Aembit, Astrix Security, CyberArk Conjur, Okta NHI, Oasis Security	COTS	Developing
Credential proxy	AgentKeys, Keychains.dev, Aegis (local), OneCLI, AgentSecrets	OSS / COTS	Developing — 5-tool convergence
Action-to-identity tracing	Microsoft Agent 365, Anthropic Compliance API (Mar 24, 2026)	COTS	Mature (vendor-stack-locked)
OAuth 2.1 / OIDC for agents	Standard OIDC + NIST CAISI Concept Paper extensions (Feb 2026)	Std	Developing

Key insight: the credential proxy (Credential Proxy Pattern for AI Agents) is the single most important control here. Five different OSS/commercial tools converged on the same architecture independently — a strong signal that it is load-bearing.

2. Control plane

Policy adjudication and capability-token issuance — Cedar / OPA Policy Decision Point evaluation, task-scoped Warrants, the OWASP four-tier least-agency engine, HITL gating on high-impact actions, and CSA ATF risk-based step-up — adjudicated before an agent’s tool call reaches the runtime.

Capability	Reference implementation	Type	Status
Policy language	Cedar (AWS, March 2026 release for AI governance), Rego	OSS	Mature
Capability tokens / Warrants	Tenuo Warrants — task-scoped, signed, ephemeral, holder-bound capability authorizations	OSS	Developing
Least-agency tier engine	OWASP four-tier (auto / notify / confirm / block)	Std	Conceptual; needs reference implementation
Tool annotation enforcement	Anthropic tool annotations, OpenAI function-call schemas	COTS	Mature
HITL primitive	Human-in-the-loop confirmation gate before high-impact tool calls	Concept	Developing
Risk-based step-up	CSA Agentic Trust Framework (Feb 2026) — 5 progressive autonomy gates	Std	Developing

The control plane is where the Lethal Trifecta is broken. If a tool call combines private-data scope + untrusted-content provenance + external-comms reach, the PDP either downgrades to a safer tier (notify or confirm) or denies the call.

3. Runtime plane

Runtime defenses around the agent process — lifecycle hooks (Google ADK, Anthropic), input filtering and prompt-injection detection, chain-of-thought alignment auditing, code-output static analysis, content-safety classification, per-task sandboxing, and (research-stage) compartmentalized privileged / quarantined LLM split.

Capability	Reference implementation	Type	Status
Lifecycle hooks	Google ADK `before_model_callback` / `before_tool_callback`; Anthropic tool-use hooks	OSS + COTS	Mature
Input filtering / prompt-injection detection	LlamaFirewall PromptGuard 2 (97.5% recall, 1% FPR), NVIDIA NeMo Jailbreak Detection NIM, Palo Alto Prisma AIRS, Onyx Platform runtime protection	OSS + COTS	Mature
Chain-of-thought alignment auditing	LlamaFirewall AlignmentCheck	OSS	Developing — novel primitive
Code-output static analysis	LlamaFirewall CodeShield, GitHub Copilot for Security	OSS + COTS	Developing
Topic / content safety	NVIDIA NeMo Content Safety NIM, Lakera Guard, Lasso, Microsoft Prompt Shields	COTS	Mature
Sandbox / containment	Firecracker (per-task VM), gVisor container, WebAssembly sandbox	OSS	Mature
Compartmentalized LLMs (CaMeL pattern)	Privileged + Quarantined LLM split (Google DeepMind)	Research	Research-stage
Proof-of-Guardrail attestation	AWS Nitro Enclaves + Miggo Security	COTS	Research-stage

Note on bypass risk. The Trendyol Tech LlamaFirewall bypass case (non-English / leetspeak prompt injections) confirms that no single guardrail is sufficient. Defense-in-depth across all six planes is required.

4. Egress plane

Network-layer mediation of agent reach to tools, MCP servers, peer agents, and the open internet — gateway-based MCP / A2A / LLM proxying, MCP runtime authorization, tool-poisoning and rug-pull defense, agent-to-agent cryptographic identity, egress destination filtering, and per-agent network segmentation.

Capability	Reference implementation	Type	Status
MCP / A2A / LLM proxy	AgentGateway (Linux Foundation, July 2025), Solo Enterprise for agentgateway	OSS + COTS	Mature (OSS)
MCP runtime authorization	Operant MCP Gateway, Natoma	COTS	Developing
rug-pull defense	Solo Enterprise tool-server fingerprinting, versioning, runtime policy	COTS	Developing
Agent-to-agent cryptographic identity	Oktsec (Ed25519 sigs), 175 detection rules, content scanning	COTS	Developing
Egress destination filtering	Credential proxy destination allowlists, Docker DOCKER-USER chain	OSS	Mature
Network segmentation	Per-agent subnet, agent VPC isolation	Infra	Mature

The egress plane is where the MCP attack surface is contained: 30+ MCP CVEs in 60 days (Q1 2026); 82% of MCP servers vulnerable to path traversal; 66% to code injection. AgentGateway’s automatic, secure token exchange limits per-tool permissions to exactly what’s needed.

5. Data plane

Trust attribution and integrity for training data, retrieval corpora, knowledge bases, per-session memory, and the AI Bill of Materials — covering AI/ML-BOM generation and runtime reconciliation, RAG provenance and attestation, memory-poisoning defense, cognitive file integrity over agent identity files, state rollback, and supply-chain scanning.

OpenClaw ecosystem tools in this plane

Several rows below (RAGShield, TrustRAG, SecureClaw / cognitive file integrity, Brain Git, Aguara Watch) originate from the OpenClaw ecosystem — a body of work whose purpose was to project where agentic data-plane security is heading, not to provide battle-tested controls. They are retained because they correctly identify real capability gaps and likely future directions. However, they are classified Exploratory and should not be treated as foundational evidence on par with OWASP AIBOM, sigstore, or Miggo Security. Treat them as emerging indicators and validate independently before production deployment.

Capability	Reference implementation	Type	Status
AI/ML-BOM generation	OWASP AIBOM Generator (CycloneDX 1.6), SPDX 3.0 AI extensions, IBM Granite 4.0 disclosures	OSS + Std	Developing
Runtime AI-BOM	Miggo Security DeepTracing (behavioral baseline)	COTS	Developing — novel
RAG provenance / attestation	RAGShield (cryptographic doc attestation), TrustRAG	Exploratory	Exploratory — OpenClaw ecosystem; not production-validated
Memory poisoning defense	Microsoft Defender for Cloud Apps memory-injection detector (50+ examples found, March 2026)	COTS	Developing
Cognitive file integrity	SHA-256 monitoring of SOUL.md, IDENTITY.md; SecureClaw	Exploratory	Exploratory — OpenClaw ecosystem; emerging indicator, not foundational control
State rollback	Brain Git (SlowMist)	Exploratory	Exploratory — OpenClaw ecosystem; no production adoption evidence
Skill / model registry signing	sigstore / cosign, OWASP AIBOM	OSS	Developing
Supply-chain scanning	JFrog ML scan, ReversingLabs	COTS	Developing
Supply-chain scanning (emerging)	Aguara Watch (5 registries daily, SlowMist)	Exploratory	Exploratory — OpenClaw ecosystem; forward indicator for registry hygiene direction

The data plane is the layer that took the most damage in Q1 2026: ClawHavoc (1,184 malicious skills), SANDWORM_MODE (npm worm into MCP), LiteLLM compromise. Defense requires registry → pre-install → checksum → cognitive-file integrity layered together.

6. Observability plane

Glass-box visibility across the five upstream planes — OpenTelemetry gen_ai.* semantic conventions, agent-aware tracing, AI-SPM, agent behavioral monitoring, identity multiplexing in logs, SIEM / SOAR with agent playbooks, and AI red-teaming integration as a continuous feed rather than a point-in-time exercise.

Capability	Reference implementation	Type	Status
OpenTelemetry `gen_ai.*` semantic conventions	OTel SemConv v1.37+ (CNCF standard; SIG contributors: Amazon, Elastic, Google, IBM, Langtrace, Microsoft, OpenLIT, Scorecard, Traceloop)	Std	Mature (experimental status, broad adoption)
Agent-aware tracing	LangSmith, Langtrace, Traceloop, Helicone	OSS + COTS	Mature
AI-SPM (Security Posture Management)	Wiz AI-SPM, Palo Alto Prisma AIRS, Orca AI-SPM, Reco, Onyx Platform	COTS	Mature
Agent behavioral monitoring (anomaly detection on agent activity)	Vectra AI, Miggo behavioral drift	COTS	Developing
Agent behavioral monitoring (emerging)	SecureClaw nightly audits	Exploratory	Exploratory — OpenClaw ecosystem; forward indicator for agent audit direction
Identity multiplexing in logs	Agent Observability §3	Concept	Developing
SIEM / SOAR with agent playbooks	Splunk + agent IOCs, Sentinel + Defender for Cloud, Falcon AIDR + NeMo Guardrails	COTS	Developing
AI red-teaming integration	Promptfoo, Mindgard CART, PyRIT (Microsoft), Garak (NVIDIA), Palo Alto Prisma AIRS (continuous CART)	OSS + COTS	Mature

Key stat: agents generate 10–20× the log volume of humans over the same time window. Observability scaling is its own problem; pre-aggregation at the runtime hook is necessary, not optional.

Mapping to deployment shapes

The architecture is invariant across deployment shapes; the populated controls differ. The table below specifies which planes are exercised, and which controls are load-bearing, for each common shape.

Deployment shape	Identity	Control	Runtime	Egress	Data	Observability
Web/desktop chatbot (no tools)	OIDC for human; agent runs as bot identity	Topic-control, content-safety policies	NeMo Content Safety, Lakera Guard	None (no tools)	RAG corpus only	OTel `gen_ai.*`
Generative coding tool (Copilot, Cursor, Claude Code)	Workspace identity + per-repo scope; agent rules files (`.cursorrules` / `IDENTITY.md`) baselined	Pre-commit hooks, Cedar policy on file mutation; destructive-action classification routes force-push / mass refactor / prod-config writes to `confirm` or `block`	LlamaFirewall CodeShield + AlignmentCheck; per-task sandbox; rogue IDE extension detection (Kirin / equivalent)	Source-control + LSP only; typosquat-aware dependency install gate	Repo content + skill registry; cognitive file integrity for rules files AND `IDENTITY.md`	LangSmith / Langtrace; tool-call audit; MCP usage + rule changes + policy-violations dashboard (Kirin-class)
Data-science copilot (Jupyter, notebooks)	User identity passthrough	Sandbox-resource policy; HITL on dataset writes	Per-notebook sandbox (Firecracker)	Internal data warehouse via credential proxy	Dataset lineage + AI-BOM	OTel + behavioral drift
RAG application	Per-tenant agent identity	Source-trust attribution; lethal-trifecta breaker	Input filter on prompts; output classifier	Vector store + per-source allowlist	RAGShield/TrustRAG; document attestation; PoisonedRAG defense	Retrieval-pattern behavioral monitoring
MCP server (consumed by agents)	mTLS + workload identity (SPIFFE)	OAuth 2.1 token exchange (CoSAI / NIST CAISI)	Tool-fingerprinting, rug-pull detection (Solo Enterprise)	n/a (server-side)	Skill/server signing, version pinning	MCP CVE feed integration
Agent skill (e.g., Claude skill, Anthropic plugin)	Skill author identity (signed)	Skill-scoped Cedar policy	Skill manifest validation; runtime sandbox	Per-skill egress allowlist	Skill registry signing (sigstore); cognitive file integrity	Skill execution telemetry
Multi-agent mesh (A2A v1.0)	Per-agent Ed25519 identity (Oktsec-side; not in spec); SPIFFE/SPIRE workload identity	Cross-agent ACL (Oktsec default-deny); A2A opacity principle; per-skill authorization advertised in Agent Card; stop-mesh-vs-isolate containment doctrine (Multi-Agent Runtime Security)	Per-agent runtime sandbox; AgentGateway broker between	A2A v1.0 over HTTPS / TLS 1.3 + signed Agent Cards (§8.4) + content scanning (Oktsec 268 rules); replay protection layered by impl (timestamps + nonces)	Shared memory / blackboard with provenance; cross-agent OTel `gen_ai.*` trace propagation	Pairwise/triadic traffic baselines; graph-walk anomaly detection (SentinelAgent / TraceAegis-class, prototype); cross-agent drift correlation; cascade detection per ASI08 doctrine

Recommended stacks by org profile

Opinionated tool selections per plane for two common org profiles. Neither stack is exhaustive — treat each as a starting point extensible per deployment shape and risk profile.

Enterprise stack (COTS-heavy)

Suited for large organizations with existing vendor relationships, centralized IAM, and SOC/SIEM infrastructure. Prioritizes vendor SLAs, support contracts, and integrations with Microsoft 365 / AWS / GCP environments.

Plane	Primary choices	Notes
Identity	Okta for AI Agents (GA Apr 30, 2026) or Microsoft Entra Agent ID + Microsoft Agent 365 Registry; CyberArk Conjur or Aembit for NHI governance	Choose based on existing IAM vendor; both support OAuth 2.1 delegation
Control	AWS Cedar managed policy service (March 2026 AI governance release); Anthropic Compliance API or Microsoft Agent Governance Toolkit (Apr 2026); Permit.io for RBAC UI	Cedar is the enterprise-grade choice; COTS wrappers add audit + workflow tooling
Runtime	LlamaFirewall (Meta OSS — no license cost) + NVIDIA NeMo NIMs (commercial inference); Microsoft Prompt Shields for content safety; per-task Firecracker VM or Hyper-V sandbox	Mix OSS guardrail (LlamaFirewall) with COTS NIM delivery for SLA coverage
Egress	Solo Enterprise for AgentGateway, Kong AI Gateway, or Cloudflare AI Gateway; Operant MCP Gateway for MCP-specific authorization; mTLS via Istio / Linkerd	Enterprise distributions of OSS gateways; prefer one with A2A v1.0 support
Data	Microsoft Purview AI (M365 environments); Wiz AI-SPM or Palo Alto Prisma AIRS; JFrog ML Catalog for AI-BOM; ReversingLabs for supply-chain scanning	Stack assumes M365 + cloud environment; swap Purview for CASB equivalent if GCP/AWS-native
Observability	DataDog AI Monitoring or New Relic AI Monitoring (OTel-native); LangSmith for agent-specific tracing; Mindgard CART for continuous red-teaming; Vectra AI or Palo Alto Cortex XSIAM for behavioral monitoring	OTel gen_ai.* spans feed into existing SIEM; Mindgard replaces point-in-time red-team for CARTS programs

FOSS / small-team stack

Suited for research teams, security teams running internal agent experiments, startups, or orgs with open-source mandates. Prioritizes zero licensing cost, community support, and composability. Requires more operational ownership.

Plane	Primary choices	Notes
Identity	SPIRE for workload identity; standard OAuth 2.1 + OIDC (any provider) for delegation; Aegis or AgentKeys for credential proxy	SPIFFE is the vendor-neutral workload-identity standard; pairs with any OIDC-compatible IdP
Control	Rego or Cedar (open-source distribution); Tenuo Warrants (Rust OSS) for task-scoped capability tokens; OWASP least-agency four-tier guidance for tier design	Both Cedar and OPA are fully OSS; Tenuo adds cryptographic delegation without a license
Runtime	LlamaFirewall PromptGuard 2 + AlignmentCheck + CodeShield (Meta OSS); NVIDIA NeMo Guardrails (OSS portion); Firecracker or gVisor for per-task sandboxing	Full LlamaFirewall stack is zero-cost; Firecracker is AWS OSS (Apache 2.0); gVisor is Google OSS
Egress	AgentGateway (Linux Foundation, Apache 2.0); mTLS via Istio (CNCF OSS) or Linkerd (CNCF OSS); Docker `DOCKER-USER` iptables chain for egress filtering	AgentGateway is the canonical OSS agent proxy; Istio adds mTLS with near-zero operational overhead vs DIY
Data	OWASP AIBOM Generator (CycloneDX 1.6, OSS); sigstore / cosign for artifact signing; RAGShield or TrustRAG for RAG attestation; Brain Git (SlowMist) for state rollback	All zero-cost; RAGShield/TrustRAG are research-grade — treat as experimental until production-validated
Observability	OpenTelemetry gen_ai.* SemConv (CNCF standard, v1.37+); Langtrace or Traceloop (OSS) for agent-aware tracing; PyRIT (Microsoft OSS) + Garak (NVIDIA OSS) + Promptfoo (OSS) for red-team coverage	OTel is the zero-cost observability foundation; the three red-team tools cover orchestration / probe / regression — all open-source

Stack evolution

The FOSS stack is production-viable at D3–D4 levels of the CMM for most deployment shapes. The Research-type items (CaMeL, RAGShield at scale) are not yet production-ready for either stack. The enterprise stack exceeds the FOSS stack primarily in operational overhead reduction (vendor support, pre-built integrations) rather than raw security capability — at L4 CMM, a well-operated FOSS stack achieves comparable controls.

Threat-control matrix (OWASP Agentic AI Top 10 → planes)

Mapping of OWASP Agentic AI Top 10 (ASI01–ASI10) risk categories to the planes that primarily mitigate them. Most categories have controls in multiple planes; the table that follows the diagram identifies the primary control surface and lists the reference controls for each.

flowchart LR
    subgraph Threats[Threats]
        ASI01[ASI01: Goal Hijack]
        ASI02[ASI02: Tool Misuse]
        ASI03[ASI03: Identity & Privilege]
        ASI04[ASI04: Supply Chain]
        ASI05[ASI05: Data Disclosure]
        ASI06[ASI06: Memory Poisoning]
        ASI07[ASI07: Inter-Agent Comms]
        ASI08[ASI08: Cascading Failures]
        ASI09[ASI09: Missing Guardrails]
        ASI10[ASI10: Rogue Agents]
    end
    subgraph Planes[Planes]
        ID[Identity]
        CTL[Control]
        RT[Runtime]
        EG[Egress]
        DT[Data]
        OBS[Observability]
    end
    ASI01 --> RT & CTL
    ASI02 --> CTL & EG
    ASI03 --> ID
    ASI04 --> DT
    ASI05 --> ID & EG
    ASI06 --> DT
    ASI07 --> EG
    ASI08 --> CTL & OBS
    ASI09 --> RT & CTL
    ASI10 --> ID & OBS

OWASP ASI	Primary plane	Reference controls
ASI01 Goal Hijack	Runtime + Control	LlamaFirewall AlignmentCheck; HITL on goal-changing actions
ASI02 Tool Misuse	Control + Egress	Cedar/OPA tool-call policy; AgentGateway runtime authz
ASI03 Identity & Privilege	Identity	Okta for AI Agents; Microsoft Entra Agent ID; credential proxy
ASI04 Supply Chain	Data	OWASP AIBOM Generator; sigstore; Aguara Watch
ASI05 Data Disclosure	Identity + Egress	Credential proxy; egress filtering; Microsoft Purview
ASI06 Memory Poisoning	Data	RAGShield; cognitive file integrity; M365 memory-injection detector
ASI07 Insecure Inter-Agent	Egress	A2A v1.0 over HTTPS + signed Agent Cards (§8.4); Oktsec Ed25519 message signing + content scanning (268 rules)
ASI08 Cascading Failures	Control + Observability	Step-up gates (CSA ATF); pairwise/triadic baselines; graph-walk anomaly detection; stop-mesh-vs-isolate doctrine (Multi-Agent Runtime Security)
ASI09 Missing Guardrails	Runtime + Control	LlamaFirewall + NeMo Guardrails; HITL primitive
ASI10 Rogue Agents	Identity + Observability	Behavioral drift detection; Okta Agent Discovery; kill switch

Trade-offs

Architectural trade-offs that vary with deployment scale, latency tolerance, and risk profile. Each row names the decision axis and the recommended default; deviations should be documented with a strategic-rationale field per the CMM reporting convention.

Single broker vs mesh. AgentGateway-as-broker is simpler and gives a chokepoint for policy. Mesh (per-service AgentGateway sidecar) scales better but multiplies the policy surface. Default to broker for ≤50 agents; move to mesh above that.
PDP location. Inline (in-process with the agent) gives the best latency but couples policy with runtime. Sidecar gives clean separation. External service is the standard zero-trust answer but adds 5–20ms per call. Default to sidecar.
Sandbox grain. Per-call sandbox is safest but expensive; per-task sandbox is the practical default; per-agent sandbox is too coarse for high-risk-tier actions.
Fail-closed vs fail-open. Default to fail-closed for high-risk-tier actions, fail-open for read-only / informational tier. CSA ATF Promotion Gates encode this directly.

Gaps in the architecture

Known unfilled spots

Compartmentalized LLM (CaMeL) reference pattern. Privileged-LLM-coordinates-quarantined-LLM is theoretically sound but lacks a vendor-neutral reference implementation. (Google DeepMind research-stage.)

Cross-tenant MCP server signing. MCP CVE rate (30+ in Q1 2026) suggests the ecosystem is pre-supply-chain-hardening. sigstore-for-MCP-servers is needed but not standardized.

Multi-agent failure containment. ASI08 (Cascading Failures) and ASI10 (Rogue Agents) have no traditional cybersecurity equivalent. Partially addressed 2026-05-02: see Multi-Agent Runtime Security for the cascade-detection / behavioral-baseline / inter-agent IR depth. Honest read: 2026 is still the academic-prototype era — graph-walk monitors (SentinelAgent, TraceAegis) ship as papers, vendor primitives exist (Oktsec rate limits + ACLs) but no integrated cascade-detection product ships with documented thresholds.

AI-BOM operationalization gap. CycloneDX 1.6 ML-BOM is the format; the operational workflow (CI/CD integration, vendor disclosure norm, AI-VEX equivalent) is thin.

Identity binding when humans are decommissioned. When the human owner of an agent leaves, the agent must be rotated or revoked. Okta and Microsoft Agent 365 cover this for managed agents; orphaned shadow agents are still discoverable but not always governable.

Prior work and comparison

No vendor-neutral, control-architecture-focused reference architecture for agentic AI security existed in the public domain as of Q1 2026. Prior work falls into two categories: cloud-specific operational guidance (hyperscaler “how to secure AI on our platform” documents) and vendor-neutral threat taxonomies (threat enumeration without control specification). The table below summarizes each and the gap each leaves unaddressed.

Framework	Published	Vendor neutral?	Agentic-specific?	What it covers	Key gap
Microsoft ZT4AI	March 2026	No (Azure / Microsoft stack)	Strong yes — agent identity, MCP governance, multi-agent verification	700+ controls across 116 groups; extends ZT to a 7th “AI” pillar; Entra Agent ID, Agent 365 registry, MCP three-tier governance	Azure-tooling-locked; MCP control specification and supply-chain guidance still maturing; tooling in preview/development as of May 2026
Azure OpenAI Reference Architecture	Ongoing (Feb 2026 update)	No (Azure)	Partial — adds Entra Agent ID, scoped tokens, HITL via Logic Apps	5-layer: Network isolation · Identity/access · Content/prompt · Data protection · Monitoring	Assumes static LLM API invocations; no multi-agent orchestration architecture; no MCP, A2A, or memory/state security
Microsoft MCRA	April 2025	No (Microsoft ecosystem)	Minimal	Enterprise security architecture diagrams across ZT pillars; AI section = Microsoft Security Copilot as a security tool	Not a defense-of-AI architecture; treats AI as defender-assistant, not as a class of workloads to secure
AWS Well-Architected Generative AI Lens	November 2025	No (AWS)	Partial — names “excessive agency” as a risk; adds an agentic AI preamble	6 WAF pillars applied to GenAI lifecycle; security pillar covers: endpoint protection, output risk, prompt security, monitoring, model integrity	”Excessive agency” named but mitigations unspecified; no agent identity, tool sandboxing, agent-to-agent trust, or MCP coverage
Google Cloud AI Security Foundations	2025 (ongoing)	No (GCP)	Partial — 6-layer model includes “Agents and Applications” layer	6 architectural layers: foundation → infrastructure → models → data → tools → agents; Model Armor for runtime guardrails; VPC/IAM/CMEK controls	”Agents and Applications” layer exists as a named category but is underspecified; no agent identity governance, tool execution sandboxing, inter-agent trust, or memory security
CSA MAESTRO	February 2025	Yes	Full — built exclusively for agentic AI systems	7-layer threat taxonomy: Foundation Models → Data Operations → Agent Frameworks → Deployment/Infra → Evaluation/Observability → Security/Compliance → Agent Ecosystem	Taxonomy only — identifies threats but specifies no controls, no implementation guidance, and no compliance mapping

What this RA adds

The six-plane model occupies the gap between cloud-specific operational guides and vendor-neutral threat taxonomies. Four contributions distinguish it:

Vendor-neutral control architecture. Unlike ZT4AI, the Azure RA, the AWS GenAI Lens, and the Google Cloud guidance, this RA specifies how to implement controls without mandating a particular hyperscaler. Reference implementations are labeled OSS / COTS / Std / Exploratory so organizations can substitute equivalent tools per plane.

Agentic-throughout. Unlike MCRA (AI = security tool) and the AWS / Google guidance (agentic = a paragraph), all six planes are designed assuming autonomous multi-step agent loops — credential proxy in the identity plane, per-task sandboxing in the runtime plane, A2A v1.0 in the egress plane, Warrant-based delegation in the control plane, memory-poisoning defense in the data plane, and behavioral-drift detection in the observability plane.

MCP-specific surface. MCP received 30+ disclosed CVEs in Q1 2026 and is absent or underspecified in the five comparison frameworks above. The egress plane explicitly addresses MCP runtime authorization, tool fingerprinting, and rug-pull defense — controls not specified in any reviewed prior work.

Tool supply chain. The data plane covers AI-BOM generation, skill / model registry signing, and supply-chain scanning. This surface is absent in the cloud-specific guides and only named (not addressed) in MAESTRO’s Agent Frameworks layer.

Inherited inputs. The RA synthesizes rather than originates; it inherits from:

ZT4AI’s principle of per-agent scoped identity with cryptographic binding
CSA MAESTRO’s 7-layer threat taxonomy as the threat-model input
CSA ATF’s promotion gates as the basis for the least-agency tier model
OWASP ASI Top 10 as the explicit control-to-threat mapping
NIST CAISI Concept Paper’s OAuth 2.1 extensions for agent delegation

The primary contribution is the synthesis: a single vendor-neutral document connecting a threat taxonomy (MAESTRO + OWASP ASI Top 10) to a plane-by-plane control library (this RA) to a maturity model (Agentic AI Security CMM 2026) — a chain that none of the comparison frameworks provides end-to-end.

Implemented by: Agentic AI Security Capability Maturity Model — A 2026 Practical Proposal — the CMM measures organizational maturity in operating this architecture.
Built on: Security Controls for AI Stacks (six-layer inventory), AI Agent Identity Architecture.
Validated against: OWASP Top 10 for Agentic Applications (ASI Top 10), MITRE ATLAS, CSA Agentic Trust Framework, CoSAI — Coalition for Secure AI.
Compared with (§Prior Work): Microsoft ZT4AI (March 2026), Azure OpenAI Reference Architecture, Microsoft MCRA, AWS Well-Architected Generative AI Lens (Nov 2025), Google Cloud AI Security Foundations, CSA MAESTRO 7-layer model.
Threat-anchored to: ClawHavoc — Agentic Skill Marketplace Supply Chain Attack, SANDWORM_MODE npm worm — AI Toolchain Poisoning, Meta Sev 1 AI Agent Breach, MCP CVEs Q1 2026, GTG-1002 — First Reported AI-Orchestrated Cyber Espionage Campaign.
Stress-tested against: Agentic AI Threat Classes — 2026 Expansion — the five threat classes a peer reviewer would surface beyond the OWASP ASI / MITRE ATLAS / Lethal Trifecta baseline (insider, APT campaign, collusion, model-version regression, jurisdictional adversary).

Enterprise Security in the Agentic AI Era

Explorer

Agentic AI Security Reference Architecture (2026)

Agentic AI Security Reference Architecture (2026)

Design principles

The six planes

1. Identity plane

2. Control plane

3. Runtime plane

4. Egress plane

5. Data plane

6. Observability plane

Mapping to deployment shapes

Recommended stacks by org profile

Enterprise stack (COTS-heavy)

FOSS / small-team stack

Threat-control matrix (OWASP Agentic AI Top 10 → planes)

Trade-offs

Gaps in the architecture

Prior work and comparison

What this RA adds

Graph View

Table of Contents

Backlinks