Agentic AI Security Reference Architecture (2026)

A vendor-neutral, layered reference architecture (RA) for securing agentic AI applications. Designed to cover all common deployment shapes — web/desktop chatbots, generative coding tools, data-science copilots, RAG systems, MCP servers, agent skills, and multi-agent meshes — with a single shared trust model.

block-beta
  columns 2
  
  User(["Human user"]):2
  
  Identity["IDENTITY PLANE"]:2
  
  Control["CONTROL PLANE"]:2
  
  Runtime["RUNTIME PLANE"]:2
  
  Egress["EGRESS PLANE"]
  Data["DATA PLANE"]
  
  Obs["OBSERVABILITY PLANE"]:2

  classDef pip fill:#cfe2ff,stroke:#0d6efd,color:#000
  classDef pdp fill:#fff3cd,stroke:#fd7e14,color:#000
  classDef pep fill:#f8d7da,stroke:#dc3545,color:#000
  classDef mixed fill:#e2d5f3,stroke:#6f42c1,color:#000
  classDef user fill:#d1e7dd,stroke:#198754,color:#000
  
  class User user
  class Identity pip
  class Control pdp
  class Runtime pep
  class Egress pep
  class Data mixed
  class Obs pip

What this RA delivers

The architecture below is the implementation surface of an oversight layer — the system that monitors, evaluates, and intervenes on AI agent behavior in production. The six planes decompose the layer into the four roles codified in NIST SP 800-162 §2.2: PDPs (Policy Decision Points, concentrated in Control), PEPs (Policy Enforcement Points, distributed across Runtime / Egress / Data), PIPs (Policy Information Points, spanning Identity / Data / Observability), and PAP (Policy Administration Point, cross-cutting policy lifecycle). The Sentinels and Operatives split (PIPs feed PDP+PEP) is the runtime decomposition; the goal is verified accountable autonomy — agents that can act on their own, but where every action is verifiable, auditable, and bounded by enforced policy.

In procurement language, this layer is what Gartner calls a guardian agent. The wiki uses the architectural primary (oversight layer / PDP+PEP) when discussing components and the procurement synonym (guardian agent) when discussing vendor categories. Both describe the same role at different levels of abstraction; see Oversight Layer (PDP + PEP for Agentic AI) §Cross-walk for the full term comparison.

Design principles

Five principles anchor the architecture, drawn from Q1 2026 incident data and practitioner consensus.

  1. Platform-level enforcement, not prompt-level. Every control that matters runs in the runtime/platform, below the model. Prompt-level guardrails are bypassable by definition. (Consensus across AI Security Standards in Q1 2026: Agentic Threats Outpace Frameworks, Emerging Cybersecurity Practices for Agentic AI Applications, Securing the Autonomous Future: Trust, Safety, and Reliability of Agentic AI.)
  2. Least agency, not just least privilege. Autonomy is a governable dimension distinct from agency — agency is the scope of permitted actions; autonomy is the degree of independent decision-making within that scope (definitions per the AWS Agentic AI Security Scoping Matrix). An agent’s allowed tier (auto / notify / confirm / block) is decided per-action, not per-agent. (OWASP, Least Agency Principle.)
  3. Verifiable identity for every actor. Every agent has a cryptographic identity that traces back to a human. Action-to-identity binding is the foundation for audit, revocation, and compliance. (AI Agent Identity Architecture, NIST CAISI Concept Paper, Feb 2026.)
  4. No agent owns credentials. The credential proxy pattern is load-bearing: 5+ tools converged on it independently. Even successful prompt injection cannot extract credentials that never enter context. (Credential Proxy Pattern for AI Agents.)
  5. The “Lethal Trifecta” is a structural test. Any deployment combining private data + untrusted content + external comms is unconditionally vulnerable. The architecture must break the trifecta at the platform layer. (Lethal Trifecta, Simon Willison.)

Competing-view callouts on principles 1 and 5

See Wiki Novelty and Counter-Arguments §Thesis 1 (platform vs prompt) and §Thesis 3 (Lethal Trifecta).

Principle 1: the framing is hierarchy, not exclusivity. Prompt-layer guardrails reduce ASR materially (LlamaFirewall on AgentDojo: 17.6%→7.5%; Constitutional Classifiers: 86%→4.4% jailbreak success). Platform-layer is primary because not bypassable by injection; prompt-layer is residual-risk reduction. RAG hardening + system-prompt architecture pages already carry residual-risk callouts in this spirit.

Principle 5: “unconditionally vulnerable” is design-time pedagogy. In production, Stripe’s containment architecture runs trifecta agents and reports 1.5–6.7% ASR depending on model — probabilistically exploitable, not unconditional. The Lethal Trifecta is necessary for natural-language exfil at scale and sufficient given current defense maturity to require platform-layer containment. Containment can drive ASR very low but not zero — and very-low-but-not-zero is unacceptable for high-risk-tier actions.

The six planes

The architecture decomposes into six logical planes. A plane is a logical separation of concerns; multiple planes may be implemented by a single product, but the controls must be addressable independently.

Plane order reflects action flow (User → Identity → Control → Runtime → Egress / Data). Each plane is annotated with its XACML role (PIP / PDP / PEP / PAP). Observability spans the bottom as a cross-cutting plane consuming signals from all five above.

Implementation type legend

Each plane table includes a Type column classifying each reference implementation. Abbreviations: OSS = open-source software; COTS = commercial off-the-shelf (vendor product / SaaS); Std = formally governed standard or specification (IETF, CNCF, OWASP, NIST, etc.); Infra = generic infrastructure primitive (cloud VPC, Docker networking); Research = academic prototype with no shipped production implementation; Concept = architectural concept without a specific canonical implementation yet; Exploratory = forward-looking prototype or ecosystem project (e.g., OpenClaw) whose purpose is to project where agentic security is heading — not validated for production use; treat as emerging indicators, not foundational controls. Many rows combine types (e.g., “OSS + COTS”) because the capability has both free and commercial implementations in common use.

block-beta
  columns 2
  
  User(["Human user"]):2
  
  Identity["IDENTITY PLANE · PIP-side<br/>Workload identity · Agent lifecycle · NHI governance · Credential proxy"]:2
  
  Control["CONTROL PLANE · PDP + PAP<br/>Policy evaluation · Capability tokens · Least-agency tiers · HITL"]:2
  
  Runtime["RUNTIME PLANE · PEP (in-process)<br/>Lifecycle hooks · Input filtering · CoT auditing · Code scanning · Sandboxing"]:2
  
  Egress["EGRESS PLANE · PEP (broker)<br/>Agent/MCP proxy · Tool authorization · Tool integrity · Egress filtering"]
  Data["DATA PLANE · PIP + PEP<br/>AI-BOM · RAG provenance · Memory integrity · State rollback · Supply-chain scanning"]
  
  Obs["OBSERVABILITY PLANE · PIP (cross-cutting)<br/>Distributed tracing · Behavioral monitoring · AI-SPM · Red-team integration"]:2

  classDef pip fill:#cfe2ff,stroke:#0d6efd,color:#000
  classDef pdp fill:#fff3cd,stroke:#fd7e14,color:#000
  classDef pep fill:#f8d7da,stroke:#dc3545,color:#000
  classDef mixed fill:#e2d5f3,stroke:#6f42c1,color:#000
  classDef user fill:#d1e7dd,stroke:#198754,color:#000
  
  class User user
  class Identity pip
  class Control pdp
  class Runtime pep
  class Egress pep
  class Data mixed
  class Obs pip

1. Identity plane

Verifiable per-agent identity bound to a human principal — covering workload identity, agent and Non-Human Identity (NHI) lifecycle governance, the credential-proxy primitive, action-to-identity tracing, and OAuth 2.1 / OIDC delegation extensions for agents.

CapabilityReference implementationTypeStatus
Workload identity SPIREOSS + StdMature
Agent identity & lifecycleOkta for AI Agents (GA Apr 30, 2026), Microsoft Entra Agent ID + Agent 365 Registry (GA May 1, 2026)COTSMature (commercial)
Non-Human Identity governanceAembit, Astrix Security, CyberArk Conjur, Okta NHI, Oasis SecurityCOTSDeveloping
Credential proxyAgentKeys, Keychains.dev, Aegis (local), OneCLI, AgentSecretsOSS / COTSDeveloping — 5-tool convergence
Action-to-identity tracingMicrosoft Agent 365, Anthropic Compliance API (Mar 24, 2026)COTSMature (vendor-stack-locked)
OAuth 2.1 / OIDC for agentsStandard OIDC + NIST CAISI Concept Paper extensions (Feb 2026)StdDeveloping

Key insight: the credential proxy (Credential Proxy Pattern for AI Agents) is the single most important control here. Five different OSS/commercial tools converged on the same architecture independently — a strong signal that it is load-bearing.

2. Control plane

Policy adjudication and capability-token issuance — Cedar / OPA Policy Decision Point evaluation, task-scoped Warrants, the OWASP four-tier least-agency engine, HITL gating on high-impact actions, and CSA ATF risk-based step-up — adjudicated before an agent’s tool call reaches the runtime.

CapabilityReference implementationTypeStatus
Policy languageCedar (AWS, March 2026 release for AI governance), RegoOSSMature
Capability tokens / WarrantsTenuo Warrants — task-scoped, signed, ephemeral, holder-bound capability authorizationsOSSDeveloping
Least-agency tier engineOWASP four-tier (auto / notify / confirm / block)StdConceptual; needs reference implementation
Tool annotation enforcementAnthropic tool annotations, OpenAI function-call schemasCOTSMature
HITL primitiveHuman-in-the-loop confirmation gate before high-impact tool callsConceptDeveloping
Risk-based step-upCSA Agentic Trust Framework (Feb 2026) — 5 progressive autonomy gatesStdDeveloping

The control plane is where the Lethal Trifecta is broken. If a tool call combines private-data scope + untrusted-content provenance + external-comms reach, the PDP either downgrades to a safer tier (notify or confirm) or denies the call.

3. Runtime plane

Runtime defenses around the agent process — lifecycle hooks (Google ADK, Anthropic), input filtering and prompt-injection detection, chain-of-thought alignment auditing, code-output static analysis, content-safety classification, per-task sandboxing, and (research-stage) compartmentalized privileged / quarantined LLM split.

CapabilityReference implementationTypeStatus
Lifecycle hooksGoogle ADK before_model_callback / before_tool_callback; Anthropic tool-use hooksOSS + COTSMature
Input filtering / prompt-injection detectionLlamaFirewall PromptGuard 2 (97.5% recall, 1% FPR), NVIDIA NeMo Jailbreak Detection NIM, Palo Alto Prisma AIRS, Onyx Platform runtime protectionOSS + COTSMature
Chain-of-thought alignment auditingLlamaFirewall AlignmentCheckOSSDeveloping — novel primitive
Code-output static analysisLlamaFirewall CodeShield, GitHub Copilot for SecurityOSS + COTSDeveloping
Topic / content safetyNVIDIA NeMo Content Safety NIM, Lakera Guard, Lasso, Microsoft Prompt ShieldsCOTSMature
Sandbox / containmentFirecracker (per-task VM), gVisor container, WebAssembly sandboxOSSMature
Compartmentalized LLMs (CaMeL pattern)Privileged + Quarantined LLM split (Google DeepMind)ResearchResearch-stage
Proof-of-Guardrail attestationAWS Nitro Enclaves + Miggo SecurityCOTSResearch-stage

Note on bypass risk. The Trendyol Tech LlamaFirewall bypass case (non-English / leetspeak prompt injections) confirms that no single guardrail is sufficient. Defense-in-depth across all six planes is required.

4. Egress plane

Network-layer mediation of agent reach to tools, MCP servers, peer agents, and the open internet — gateway-based MCP / A2A / LLM proxying, MCP runtime authorization, tool-poisoning and rug-pull defense, agent-to-agent cryptographic identity, egress destination filtering, and per-agent network segmentation.

CapabilityReference implementationTypeStatus
MCP / A2A / LLM proxyAgentGateway (Linux Foundation, July 2025), Solo Enterprise for agentgatewayOSS + COTSMature (OSS)
MCP runtime authorizationOperant MCP Gateway, NatomaCOTSDeveloping
rug-pull defenseSolo Enterprise tool-server fingerprinting, versioning, runtime policyCOTSDeveloping
Agent-to-agent cryptographic identityOktsec (Ed25519 sigs), 175 detection rules, content scanningCOTSDeveloping
Egress destination filteringCredential proxy destination allowlists, Docker DOCKER-USER chainOSSMature
Network segmentationPer-agent subnet, agent VPC isolationInfraMature

The egress plane is where the MCP attack surface is contained: 30+ MCP CVEs in 60 days (Q1 2026); 82% of MCP servers vulnerable to path traversal; 66% to code injection. AgentGateway’s automatic, secure token exchange limits per-tool permissions to exactly what’s needed.

5. Data plane

Trust attribution and integrity for training data, retrieval corpora, knowledge bases, per-session memory, and the AI Bill of Materials — covering AI/ML-BOM generation and runtime reconciliation, RAG provenance and attestation, memory-poisoning defense, cognitive file integrity over agent identity files, state rollback, and supply-chain scanning.

OpenClaw ecosystem tools in this plane

Several rows below (RAGShield, TrustRAG, SecureClaw / cognitive file integrity, Brain Git, Aguara Watch) originate from the OpenClaw ecosystem — a body of work whose purpose was to project where agentic data-plane security is heading, not to provide battle-tested controls. They are retained because they correctly identify real capability gaps and likely future directions. However, they are classified Exploratory and should not be treated as foundational evidence on par with OWASP AIBOM, sigstore, or Miggo Security. Treat them as emerging indicators and validate independently before production deployment.

CapabilityReference implementationTypeStatus
AI/ML-BOM generationOWASP AIBOM Generator (CycloneDX 1.6), SPDX 3.0 AI extensions, IBM Granite 4.0 disclosuresOSS + StdDeveloping
Runtime AI-BOMMiggo Security DeepTracing (behavioral baseline)COTSDeveloping — novel
RAG provenance / attestationRAGShield (cryptographic doc attestation), TrustRAGExploratoryExploratory — OpenClaw ecosystem; not production-validated
Memory poisoning defenseMicrosoft Defender for Cloud Apps memory-injection detector (50+ examples found, March 2026)COTSDeveloping
Cognitive file integritySHA-256 monitoring of SOUL.md, IDENTITY.md; SecureClawExploratoryExploratory — OpenClaw ecosystem; emerging indicator, not foundational control
State rollbackBrain Git (SlowMist)ExploratoryExploratory — OpenClaw ecosystem; no production adoption evidence
Skill / model registry signingsigstore / cosign, OWASP AIBOMOSSDeveloping
Supply-chain scanningJFrog ML scan, ReversingLabsCOTSDeveloping
Supply-chain scanning (emerging)Aguara Watch (5 registries daily, SlowMist)ExploratoryExploratory — OpenClaw ecosystem; forward indicator for registry hygiene direction

The data plane is the layer that took the most damage in Q1 2026: ClawHavoc (1,184 malicious skills), SANDWORM_MODE (npm worm into MCP), LiteLLM compromise. Defense requires registry → pre-install → checksum → cognitive-file integrity layered together.

6. Observability plane

Glass-box visibility across the five upstream planes — OpenTelemetry gen_ai.* semantic conventions, agent-aware tracing, AI-SPM, agent behavioral monitoring, identity multiplexing in logs, SIEM / SOAR with agent playbooks, and AI red-teaming integration as a continuous feed rather than a point-in-time exercise.

CapabilityReference implementationTypeStatus
OpenTelemetry gen_ai.* semantic conventionsOTel SemConv v1.37+ (CNCF standard; SIG contributors: Amazon, Elastic, Google, IBM, Langtrace, Microsoft, OpenLIT, Scorecard, Traceloop)StdMature (experimental status, broad adoption)
Agent-aware tracingLangSmith, Langtrace, Traceloop, HeliconeOSS + COTSMature
AI-SPM (Security Posture Management)Wiz AI-SPM, Palo Alto Prisma AIRS, Orca AI-SPM, Reco, Onyx PlatformCOTSMature
Agent behavioral monitoring (anomaly detection on agent activity)Vectra AI, Miggo behavioral driftCOTSDeveloping
Agent behavioral monitoring (emerging)SecureClaw nightly auditsExploratoryExploratory — OpenClaw ecosystem; forward indicator for agent audit direction
Identity multiplexing in logsAgent Observability §3ConceptDeveloping
SIEM / SOAR with agent playbooksSplunk + agent IOCs, Sentinel + Defender for Cloud, Falcon AIDR + NeMo GuardrailsCOTSDeveloping
AI red-teaming integrationPromptfoo, Mindgard CART, PyRIT (Microsoft), Garak (NVIDIA), Palo Alto Prisma AIRS (continuous CART)OSS + COTSMature

Key stat: agents generate 10–20× the log volume of humans over the same time window. Observability scaling is its own problem; pre-aggregation at the runtime hook is necessary, not optional.

Mapping to deployment shapes

The architecture is invariant across deployment shapes; the populated controls differ. The table below specifies which planes are exercised, and which controls are load-bearing, for each common shape.

Deployment shapeIdentityControlRuntimeEgressDataObservability
Web/desktop chatbot (no tools)OIDC for human; agent runs as bot identityTopic-control, content-safety policiesNeMo Content Safety, Lakera GuardNone (no tools)RAG corpus onlyOTel gen_ai.*
Generative coding tool (Copilot, Cursor, Claude Code)Workspace identity + per-repo scope; agent rules files (.cursorrules / IDENTITY.md) baselinedPre-commit hooks, Cedar policy on file mutation; destructive-action classification routes force-push / mass refactor / prod-config writes to confirm or blockLlamaFirewall CodeShield + AlignmentCheck; per-task sandbox; rogue IDE extension detection (Kirin / equivalent)Source-control + LSP only; typosquat-aware dependency install gateRepo content + skill registry; cognitive file integrity for rules files AND IDENTITY.mdLangSmith / Langtrace; tool-call audit; MCP usage + rule changes + policy-violations dashboard (Kirin-class)
Data-science copilot (Jupyter, notebooks)User identity passthroughSandbox-resource policy; HITL on dataset writesPer-notebook sandbox (Firecracker)Internal data warehouse via credential proxyDataset lineage + AI-BOMOTel + behavioral drift
RAG applicationPer-tenant agent identitySource-trust attribution; lethal-trifecta breakerInput filter on prompts; output classifierVector store + per-source allowlistRAGShield/TrustRAG; document attestation; PoisonedRAG defenseRetrieval-pattern behavioral monitoring
MCP server (consumed by agents)mTLS + workload identity (SPIFFE)OAuth 2.1 token exchange (CoSAI / NIST CAISI)Tool-fingerprinting, rug-pull detection (Solo Enterprise)n/a (server-side)Skill/server signing, version pinningMCP CVE feed integration
Agent skill (e.g., Claude skill, Anthropic plugin)Skill author identity (signed)Skill-scoped Cedar policySkill manifest validation; runtime sandboxPer-skill egress allowlistSkill registry signing (sigstore); cognitive file integritySkill execution telemetry
Multi-agent mesh (A2A v1.0)Per-agent Ed25519 identity (Oktsec-side; not in spec); SPIFFE/SPIRE workload identityCross-agent ACL (Oktsec default-deny); A2A opacity principle; per-skill authorization advertised in Agent Card; stop-mesh-vs-isolate containment doctrine (Multi-Agent Runtime Security)Per-agent runtime sandbox; AgentGateway broker betweenA2A v1.0 over HTTPS / TLS 1.3 + signed Agent Cards (§8.4) + content scanning (Oktsec 268 rules); replay protection layered by impl (timestamps + nonces)Shared memory / blackboard with provenance; cross-agent OTel gen_ai.* trace propagationPairwise/triadic traffic baselines; graph-walk anomaly detection (SentinelAgent / TraceAegis-class, prototype); cross-agent drift correlation; cascade detection per ASI08 doctrine

Opinionated tool selections per plane for two common org profiles. Neither stack is exhaustive — treat each as a starting point extensible per deployment shape and risk profile.

Enterprise stack (COTS-heavy)

Suited for large organizations with existing vendor relationships, centralized IAM, and SOC/SIEM infrastructure. Prioritizes vendor SLAs, support contracts, and integrations with Microsoft 365 / AWS / GCP environments.

PlanePrimary choicesNotes
IdentityOkta for AI Agents (GA Apr 30, 2026) or Microsoft Entra Agent ID + Microsoft Agent 365 Registry; CyberArk Conjur or Aembit for NHI governanceChoose based on existing IAM vendor; both support OAuth 2.1 delegation
ControlAWS Cedar managed policy service (March 2026 AI governance release); Anthropic Compliance API or Microsoft Agent Governance Toolkit (Apr 2026); Permit.io for RBAC UICedar is the enterprise-grade choice; COTS wrappers add audit + workflow tooling
RuntimeLlamaFirewall (Meta OSS — no license cost) + NVIDIA NeMo NIMs (commercial inference); Microsoft Prompt Shields for content safety; per-task Firecracker VM or Hyper-V sandboxMix OSS guardrail (LlamaFirewall) with COTS NIM delivery for SLA coverage
EgressSolo Enterprise for AgentGateway, Kong AI Gateway, or Cloudflare AI Gateway; Operant MCP Gateway for MCP-specific authorization; mTLS via Istio / LinkerdEnterprise distributions of OSS gateways; prefer one with A2A v1.0 support
DataMicrosoft Purview AI (M365 environments); Wiz AI-SPM or Palo Alto Prisma AIRS; JFrog ML Catalog for AI-BOM; ReversingLabs for supply-chain scanningStack assumes M365 + cloud environment; swap Purview for CASB equivalent if GCP/AWS-native
ObservabilityDataDog AI Monitoring or New Relic AI Monitoring (OTel-native); LangSmith for agent-specific tracing; Mindgard CART for continuous red-teaming; Vectra AI or Palo Alto Cortex XSIAM for behavioral monitoringOTel gen_ai.* spans feed into existing SIEM; Mindgard replaces point-in-time red-team for CARTS programs

FOSS / small-team stack

Suited for research teams, security teams running internal agent experiments, startups, or orgs with open-source mandates. Prioritizes zero licensing cost, community support, and composability. Requires more operational ownership.

PlanePrimary choicesNotes
Identity SPIRE for workload identity; standard OAuth 2.1 + OIDC (any provider) for delegation; Aegis or AgentKeys for credential proxySPIFFE is the vendor-neutral workload-identity standard; pairs with any OIDC-compatible IdP
ControlRego or Cedar (open-source distribution); Tenuo Warrants (Rust OSS) for task-scoped capability tokens; OWASP least-agency four-tier guidance for tier designBoth Cedar and OPA are fully OSS; Tenuo adds cryptographic delegation without a license
RuntimeLlamaFirewall PromptGuard 2 + AlignmentCheck + CodeShield (Meta OSS); NVIDIA NeMo Guardrails (OSS portion); Firecracker or gVisor for per-task sandboxingFull LlamaFirewall stack is zero-cost; Firecracker is AWS OSS (Apache 2.0); gVisor is Google OSS
EgressAgentGateway (Linux Foundation, Apache 2.0); mTLS via Istio (CNCF OSS) or Linkerd (CNCF OSS); Docker DOCKER-USER iptables chain for egress filteringAgentGateway is the canonical OSS agent proxy; Istio adds mTLS with near-zero operational overhead vs DIY
DataOWASP AIBOM Generator (CycloneDX 1.6, OSS); sigstore / cosign for artifact signing; RAGShield or TrustRAG for RAG attestation; Brain Git (SlowMist) for state rollbackAll zero-cost; RAGShield/TrustRAG are research-grade — treat as experimental until production-validated
ObservabilityOpenTelemetry gen_ai.* SemConv (CNCF standard, v1.37+); Langtrace or Traceloop (OSS) for agent-aware tracing; PyRIT (Microsoft OSS) + Garak (NVIDIA OSS) + Promptfoo (OSS) for red-team coverageOTel is the zero-cost observability foundation; the three red-team tools cover orchestration / probe / regression — all open-source

Stack evolution

The FOSS stack is production-viable at D3–D4 levels of the CMM for most deployment shapes. The Research-type items (CaMeL, RAGShield at scale) are not yet production-ready for either stack. The enterprise stack exceeds the FOSS stack primarily in operational overhead reduction (vendor support, pre-built integrations) rather than raw security capability — at L4 CMM, a well-operated FOSS stack achieves comparable controls.

Threat-control matrix (OWASP Agentic AI Top 10 → planes)

Mapping of OWASP Agentic AI Top 10 (ASI01ASI10) risk categories to the planes that primarily mitigate them. Most categories have controls in multiple planes; the table that follows the diagram identifies the primary control surface and lists the reference controls for each.

flowchart LR
    subgraph Threats[Threats]
        ASI01[ASI01: Goal Hijack]
        ASI02[ASI02: Tool Misuse]
        ASI03[ASI03: Identity & Privilege]
        ASI04[ASI04: Supply Chain]
        ASI05[ASI05: Data Disclosure]
        ASI06[ASI06: Memory Poisoning]
        ASI07[ASI07: Inter-Agent Comms]
        ASI08[ASI08: Cascading Failures]
        ASI09[ASI09: Missing Guardrails]
        ASI10[ASI10: Rogue Agents]
    end
    subgraph Planes[Planes]
        ID[Identity]
        CTL[Control]
        RT[Runtime]
        EG[Egress]
        DT[Data]
        OBS[Observability]
    end
    ASI01 --> RT & CTL
    ASI02 --> CTL & EG
    ASI03 --> ID
    ASI04 --> DT
    ASI05 --> ID & EG
    ASI06 --> DT
    ASI07 --> EG
    ASI08 --> CTL & OBS
    ASI09 --> RT & CTL
    ASI10 --> ID & OBS
OWASP ASIPrimary planeReference controls
ASI01 Goal HijackRuntime + ControlLlamaFirewall AlignmentCheck; HITL on goal-changing actions
ASI02 Tool MisuseControl + EgressCedar/OPA tool-call policy; AgentGateway runtime authz
ASI03 Identity & PrivilegeIdentityOkta for AI Agents; Microsoft Entra Agent ID; credential proxy
ASI04 Supply ChainDataOWASP AIBOM Generator; sigstore; Aguara Watch
ASI05 Data DisclosureIdentity + EgressCredential proxy; egress filtering; Microsoft Purview
ASI06 Memory PoisoningDataRAGShield; cognitive file integrity; M365 memory-injection detector
ASI07 Insecure Inter-AgentEgressA2A v1.0 over HTTPS + signed Agent Cards (§8.4); Oktsec Ed25519 message signing + content scanning (268 rules)
ASI08 Cascading FailuresControl + ObservabilityStep-up gates (CSA ATF); pairwise/triadic baselines; graph-walk anomaly detection; stop-mesh-vs-isolate doctrine (Multi-Agent Runtime Security)
ASI09 Missing GuardrailsRuntime + ControlLlamaFirewall + NeMo Guardrails; HITL primitive
ASI10 Rogue AgentsIdentity + ObservabilityBehavioral drift detection; Okta Agent Discovery; kill switch

Trade-offs

Architectural trade-offs that vary with deployment scale, latency tolerance, and risk profile. Each row names the decision axis and the recommended default; deviations should be documented with a strategic-rationale field per the CMM reporting convention.

  • Single broker vs mesh. AgentGateway-as-broker is simpler and gives a chokepoint for policy. Mesh (per-service AgentGateway sidecar) scales better but multiplies the policy surface. Default to broker for ≤50 agents; move to mesh above that.
  • PDP location. Inline (in-process with the agent) gives the best latency but couples policy with runtime. Sidecar gives clean separation. External service is the standard zero-trust answer but adds 5–20ms per call. Default to sidecar.
  • Sandbox grain. Per-call sandbox is safest but expensive; per-task sandbox is the practical default; per-agent sandbox is too coarse for high-risk-tier actions.
  • Fail-closed vs fail-open. Default to fail-closed for high-risk-tier actions, fail-open for read-only / informational tier. CSA ATF Promotion Gates encode this directly.

Gaps in the architecture

Known unfilled spots

  1. Compartmentalized LLM (CaMeL) reference pattern. Privileged-LLM-coordinates-quarantined-LLM is theoretically sound but lacks a vendor-neutral reference implementation. (Google DeepMind research-stage.)
  2. Cross-tenant MCP server signing. MCP CVE rate (30+ in Q1 2026) suggests the ecosystem is pre-supply-chain-hardening. sigstore-for-MCP-servers is needed but not standardized.
  3. Multi-agent failure containment. ASI08 (Cascading Failures) and ASI10 (Rogue Agents) have no traditional cybersecurity equivalent. Partially addressed 2026-05-02: see Multi-Agent Runtime Security for the cascade-detection / behavioral-baseline / inter-agent IR depth. Honest read: 2026 is still the academic-prototype era — graph-walk monitors (SentinelAgent, TraceAegis) ship as papers, vendor primitives exist (Oktsec rate limits + ACLs) but no integrated cascade-detection product ships with documented thresholds.
  4. AI-BOM operationalization gap. CycloneDX 1.6 ML-BOM is the format; the operational workflow (CI/CD integration, vendor disclosure norm, AI-VEX equivalent) is thin.
  5. Identity binding when humans are decommissioned. When the human owner of an agent leaves, the agent must be rotated or revoked. Okta and Microsoft Agent 365 cover this for managed agents; orphaned shadow agents are still discoverable but not always governable.

Prior work and comparison

No vendor-neutral, control-architecture-focused reference architecture for agentic AI security existed in the public domain as of Q1 2026. Prior work falls into two categories: cloud-specific operational guidance (hyperscaler “how to secure AI on our platform” documents) and vendor-neutral threat taxonomies (threat enumeration without control specification). The table below summarizes each and the gap each leaves unaddressed.

FrameworkPublishedVendor neutral?Agentic-specific?What it coversKey gap
Microsoft ZT4AIMarch 2026No (Azure / Microsoft stack)Strong yes — agent identity, MCP governance, multi-agent verification700+ controls across 116 groups; extends ZT to a 7th “AI” pillar; Entra Agent ID, Agent 365 registry, MCP three-tier governanceAzure-tooling-locked; MCP control specification and supply-chain guidance still maturing; tooling in preview/development as of May 2026
Azure OpenAI Reference ArchitectureOngoing (Feb 2026 update)No (Azure)Partial — adds Entra Agent ID, scoped tokens, HITL via Logic Apps5-layer: Network isolation · Identity/access · Content/prompt · Data protection · MonitoringAssumes static LLM API invocations; no multi-agent orchestration architecture; no MCP, A2A, or memory/state security
Microsoft MCRAApril 2025No (Microsoft ecosystem)MinimalEnterprise security architecture diagrams across ZT pillars; AI section = Microsoft Security Copilot as a security toolNot a defense-of-AI architecture; treats AI as defender-assistant, not as a class of workloads to secure
AWS Well-Architected Generative AI LensNovember 2025No (AWS)Partial — names “excessive agency” as a risk; adds an agentic AI preamble6 WAF pillars applied to GenAI lifecycle; security pillar covers: endpoint protection, output risk, prompt security, monitoring, model integrity”Excessive agency” named but mitigations unspecified; no agent identity, tool sandboxing, agent-to-agent trust, or MCP coverage
Google Cloud AI Security Foundations2025 (ongoing)No (GCP)Partial — 6-layer model includes “Agents and Applications” layer6 architectural layers: foundation → infrastructure → models → data → tools → agents; Model Armor for runtime guardrails; VPC/IAM/CMEK controls”Agents and Applications” layer exists as a named category but is underspecified; no agent identity governance, tool execution sandboxing, inter-agent trust, or memory security
CSA MAESTROFebruary 2025YesFull — built exclusively for agentic AI systems7-layer threat taxonomy: Foundation Models → Data Operations → Agent Frameworks → Deployment/Infra → Evaluation/Observability → Security/Compliance → Agent EcosystemTaxonomy only — identifies threats but specifies no controls, no implementation guidance, and no compliance mapping

What this RA adds

The six-plane model occupies the gap between cloud-specific operational guides and vendor-neutral threat taxonomies. Four contributions distinguish it:

Vendor-neutral control architecture. Unlike ZT4AI, the Azure RA, the AWS GenAI Lens, and the Google Cloud guidance, this RA specifies how to implement controls without mandating a particular hyperscaler. Reference implementations are labeled OSS / COTS / Std / Exploratory so organizations can substitute equivalent tools per plane.

Agentic-throughout. Unlike MCRA (AI = security tool) and the AWS / Google guidance (agentic = a paragraph), all six planes are designed assuming autonomous multi-step agent loops — credential proxy in the identity plane, per-task sandboxing in the runtime plane, A2A v1.0 in the egress plane, Warrant-based delegation in the control plane, memory-poisoning defense in the data plane, and behavioral-drift detection in the observability plane.

MCP-specific surface. MCP received 30+ disclosed CVEs in Q1 2026 and is absent or underspecified in the five comparison frameworks above. The egress plane explicitly addresses MCP runtime authorization, tool fingerprinting, and rug-pull defense — controls not specified in any reviewed prior work.

Tool supply chain. The data plane covers AI-BOM generation, skill / model registry signing, and supply-chain scanning. This surface is absent in the cloud-specific guides and only named (not addressed) in MAESTRO’s Agent Frameworks layer.

Inherited inputs. The RA synthesizes rather than originates; it inherits from:

  • ZT4AI’s principle of per-agent scoped identity with cryptographic binding
  • CSA MAESTRO’s 7-layer threat taxonomy as the threat-model input
  • CSA ATF’s promotion gates as the basis for the least-agency tier model
  • OWASP ASI Top 10 as the explicit control-to-threat mapping
  • NIST CAISI Concept Paper’s OAuth 2.1 extensions for agent delegation

The primary contribution is the synthesis: a single vendor-neutral document connecting a threat taxonomy (MAESTRO + OWASP ASI Top 10) to a plane-by-plane control library (this RA) to a maturity model (Agentic AI Security CMM 2026) — a chain that none of the comparison frameworks provides end-to-end.