Security Controls for AI Stacks
Question
What security controls exist for agentic AI stacks, where do they live in the stack, and what are the gaps?
Current Position
Working controls converge into six layers. Mature options exist at the identity and observability layers; emerging options at containment and network; material gaps remain at the model and data layers.
Platform-layer over prompt-layer
The strongest practitioner consensus across the ingested sources (AI Security Standards in Q1 2026: Agentic Threats Outpace Frameworks, Securing the Autonomous Future: Trust, Safety, and Reliability of Agentic AI, Emerging Cybersecurity Practices for Agentic AI Applications) is that security controls must now be enforced at the platform layer, not the prompt layer. Prompt-level guardrails are bypassable; platform-level enforcement (input filtering at the broker, egress control, capability-based authorization, sandboxing) is not.
Evidence from Emerging Cybersecurity Practices for Agentic AI Applications: the APort Agent Guardrail case study articulates this precisely — “pre-action authorization must run in the runtime/platform, so the platform invokes the guardrail for every tool call regardless of what the model outputs.” This mirrors a fundamental security principle: controls must be enforced at a layer below the layer they protect. Network firewalls enforce below the application; tool call interception must enforce below the LLM.
The Six Layers
1. Identity layer (mature)
What it does: assigns verifiable identities to AI agents and traces actions back to invoking humans.
| Control | Page | Status |
|---|---|---|
| Workload identity (SPIFFE/SPIRE) | SPIRE | Stub — adoption growing |
| Non-Human Identity (NHI) governance | Non-Human Identity, NHI Governance for Agents | Developing |
| Reference architecture | AI Agent Identity Architecture | Developing |
| Credential lifecycle (Credential Zero) | NHI Governance for Agents | Developing |
| Capability-based authorization (Warrants) | Task-scoped, signed, ephemeral capability authorizations | Developing |
| Credential proxy (proxy token, vault injection) | Credential Proxy Pattern | Developing — multi-tool convergence confirms this is load-bearing |
Verdict: mature options exist; integration discipline (action-to-identity tracing) is what distinguishes well-postured organizations from exposed ones. The Credential Proxy Pattern for AI Agents has independently converged across 5+ tools in the OpenClaw ecosystem — a strong signal that credential exposure (not just credential rotation) is a distinct control gap. See Emerging Cybersecurity Practices for Agentic AI Applications.
2. Observability layer (mature)
What it does: turns the agent from a black box into a glass box; traces reasoning, tool calls, and identities.
| Control | Page | Status |
|---|---|---|
| Lifecycle hooks + reference monitors | Agent Observability §1 | Developing |
OpenTelemetry gen_ai.* semantic conventions | Agent Observability §2 | Developing |
| Identity multiplexing in logs | Agent Observability §3 | Developing |
| Cedar policy for action mediation | Agent Observability §4A | Developing |
| Agent Cards (System of Record) | Agent Observability §4C | Developing |
| Context-aware trimming with pinned tags | Agent Observability §5 | Developing |
| Agent behavioral monitoring (anomaly detection) | Agent Observability §7 | Developing |
| AI-BOM runtime discovery (behavioral baseline) | AI-BOM — Miggo DeepTracing pattern | Developing | | Behavioral drift detection | Agent Observability §7, AI-BOM §Runtime | Developing | | Cognitive file integrity monitoring | Supply Chain Security | Developing — extends FIM to SOUL.md / IDENTITY.md |
Verdict: mature catalogue. Practitioner consensus is strong. Tooling exists (OTel gen_ai.* conventions are real). New from Emerging Cybersecurity Practices for Agentic AI Applications: Miggo’s AI-BOM runtime discovery with DeepTracing adds a continuous behavioral BOM approach. Cognitive file integrity (monitoring SHA-256 drift on identity files) is a genuinely new extension of traditional FIM to agentic-specific artifacts.
3. Containment layer (emerging)
What it does: prevents an agent from doing damage when other controls fail.
| Control | Page | Status |
|---|---|---|
| Agent sandboxing (OS-level isolation) | Agent Sandboxing | Developing — last-line-of-defense |
| Lethal Trifecta containment (egress control, human confirmation, tool annotation) | Stripe (stub — needs full architecture page) | Stub |
| Human-in-the-Loop primitive | Confirmation gate before high-impact tool calls | Developing |
| Reversible-actions-only constraint | Tight coupling between AI inference and consequential action is risky; constrain to reversible actions only, with circuit breakers | Developing |
| Least agency tiers (auto/notify/confirm/block) | Least Agency Principle | Developing — OWASP-sourced |
| Tool call interception at platform layer | Prompt Injection Containment | Developing |
| AlignmentCheck (chain-of-thought auditing) | Prompt Injection Containment, LlamaFirewall | Developing |
| Kill switches / instant revocation | Credential Proxy Pattern | Developing |
| State rollback (Brain Git) | Supply Chain Security | Developing — agentic-specific |
Containment layer gaps — PARTIALLY CLOSED
The Stripe Lethal Trifecta architecture is widely cited but not yet documented in the wiki as a standalone reference architecture. Worth elevating from the Stripe stub to a
wiki/architectures/lethal-trifecta-containment.mdpage.2026-04-30 update: Emerging Cybersecurity Practices for Agentic AI Applications provides substantial containment layer content. The Least Agency Principle page now documents the OWASP four-tier autonomy classification. The Prompt Injection Containment for Agentic Systems page covers tool call interception, AlignmentCheck, and platform-level vs. prompt-level enforcement. LlamaFirewall’s AlignmentCheck (chain-of-thought auditing) is a novel containment primitive not present in earlier sources. Kill switch / instant revocation and Brain Git rollback are documented in Supply Chain Security for Agentic AI. The Lethal Trifecta architecture page remains the one unfilled stub.
4. Network / protocol layer (emerging)
What it does: secures inter-agent and agent-to-tool communication.
| Control | Page | Status |
|---|---|---|
| MCP Security taxonomy (CoSAI WS4) | MCP Security, CoSAI | Developing |
| AgentGateway (open-source MCP gateway) | AgentGateway | Stub |
| A2A Protocol with Agent Cards / opacity principle | A2A Protocol | Stub |
| Egress control patterns | (no dedicated page yet) | Gap — see update note |
| Agent-to-agent cryptographic identity (Ed25519) | Agent Identity Architecture, Emerging Practices §2.4 | Developing — Oktsec implementation |
| Multi-rule content scanning on inter-agent messages (Oktsec ships 268 rules at v0.15.2 — see A2A page) | Emerging Practices §2.4 | Developing — Oktsec |
Egress control patterns — PARTIALLY ADDRESSED
Multiple sources reference “controlled egress” as a Lethal Trifecta containment primitive. A dedicated wiki page
wiki/practices/egress-control-patterns.mdis still warranted.2026-04-30 update: Emerging Cybersecurity Practices for Agentic AI Applications documents the credential proxy pattern as a practical egress control — the proxy validates outbound requests against allowed targets before injecting credentials, providing destination-based egress filtering. Docker DOCKER-USER chain rules for container networking and network segmentation for agent runtimes are also documented in Section 2.7. The dedicated egress-control-patterns page remains a candidate for a future working session.
5. Model layer (research-stage / gap)
What it does: detects malicious intent or deception inside the model itself.
| Control | Page | Status |
|---|---|---|
| LlamaFirewall (open-source guardrail) | LlamaFirewall | Developing — now substantive, not just a stub |
| Mechanistic interpretability for “internal EDR” | Agent Observability §6 | Developing — research-stage |
| Prompt-level guardrails | (no dedicated page; widely seen as bypassable) | Acknowledged-weak |
| Prompt injection detection / input filtering | Prompt Injection Containment §Layer 1 | Developing |
| Proof-of-Guardrail (TEE attestation) | Emerging Practices §2.5 | Research-stage — novel primitive |
Model-layer is the weakest link
Prompt-level guardrails are not robust controls. Platform-layer enforcement is the practical answer. Mechanistic interpretability is promising but research-stage. The honest read is: the model layer offers detection, not prevention — and detection itself is unreliable.
2026-04-30 update from Emerging Cybersecurity Practices for Agentic AI Applications: LlamaFirewall now has three substantive components — PromptGuard 2 (injection detection, 90% attack reduction), AlignmentCheck (chain-of-thought auditing for goal hijacking, a novel prospective control), and CodeShield (static analysis for generated code). Proof-of-Guardrail using AWS Nitro Enclaves to cryptographically attest that guardrails executed is a genuinely novel primitive that moves from “trust the vendor” to “verify the guardrail ran.” Still research-stage but a promising direction.
6. Data layer (gap)
What it does: protects training data, RAG sources, and model artifacts; provides supply-chain assurance.
| Control | Page | Status |
|---|---|---|
| AI-BOM / ML-BOM | AI-BOM | Developing — gap closed |
| Supply-chain multi-layer defense | Supply Chain Security for Agents | Developing — gap closed |
| Supply-chain attack disclosure | ClawHavoc, SANDWORM_MODE, LiteLLM | Incident pages |
| Cognitive file integrity (SOUL.md / IDENTITY.md) | Supply Chain Security §Layer 4 | Developing — new category |
| Data poisoning defenses | (no dedicated page) | Gap — still open |
| RAG poisoning defenses | (no dedicated page) | Gap — still open |
Data layer — PARTIALLY CLOSED
Three Q1 2026 incidents are pure supply-chain (ClawHavoc — Agentic Skill Marketplace Supply Chain Attack, SANDWORM_MODE npm worm — AI Toolchain Poisoning, LiteLLM Supply Chain Compromise (Google ADK Dependency)) — ML-BOM adoption lags 48% behind SBOM (per AI Security Standards in Q1 2026: Agentic Threats Outpace Frameworks).
2026-04-30 update: AI-BOM: AI Bill of Materials page now documents the AI Bill of Materials control (static + runtime, CycloneDX format, Miggo runtime discovery pattern). Supply Chain Security for Agentic AI covers the multi-layer supply chain defense model (registry scanning → pre-install scanning → checksum verification → cognitive file integrity → behavioral drift detection). Cognitive file integrity (SHA-256 monitoring of SOUL.md, IDENTITY.md, Brain Git rollback) is a new agentic-specific category unique to this source.
Still open: data poisoning defenses and RAG poisoning defenses have no dedicated pages. These require a separate source or working session. Candidate:
wiki/practices/data-poisoning-defenses.md.
What the Frameworks Cover (and Don’t)
| Framework | Layers covered well | Layers under-covered |
|---|---|---|
| NIST AI RMF | Governance overlay; risk-management process | Technical controls per layer |
| MITRE ATLAS | Threat taxonomy (attacker perspective) | Defender controls |
| OWASP LLM Top 10 | Model layer awareness | Identity, network, data |
| OWASP Agentic AI Top 10 | Agent orchestration risks | Implementation guidance |
| IEC 42001 | Management system / governance | Technical security controls |
| Google SAIF | Lifecycle conceptual model | Concrete operational controls |
| CoSAI (MCP white paper, secure-by-design principles) | Network/protocol layer (MCP); secure-by-design conceptual | Reference implementations |
| ZT4AI | 700+ controls — most comprehensive | Microsoft-stack-specific |
| ATF | Cross-layer threat model + autonomy promotion gates | Operational tooling |
How This Has Evolved
- 2026-04-30 (round 1) — initial synthesis. Layers identified from three ingested sources. Material gaps flagged. Reference architecture stub created.
- 2026-04-30 (round 2) — ingested Emerging Cybersecurity Practices for Agentic AI Applications. Significant gap closure:
- Identity layer: Credential Proxy Pattern for AI Agents documented with 5-tool convergence evidence.
- Observability layer: AI-BOM runtime discovery (Miggo DeepTracing), cognitive file integrity added.
- Containment layer: Least Agency Principle (OWASP four-tier autonomy model), Prompt Injection Containment for Agentic Systems with AlignmentCheck. LlamaFirewall stub elevated to substantive page content.
- Data layer: AI-BOM: AI Bill of Materials and Supply Chain Security for Agentic AI pages created. RAG poisoning + data poisoning defenses remain open gaps.
- Network/protocol layer: Ed25519 agent-to-agent signing (Oktsec; vendor-side, not in A2A v1.0 spec), Oktsec content scanning (268 rules at v0.15.2) added.
- (next) — Lethal Trifecta architecture page, egress control patterns, data poisoning defenses.
Open Sub-Questions
Move to gaps/
- Egress control patterns — what specific egress mechanisms work? OPA/Cedar at the broker? A separate egress proxy? Network-segment isolation? (Partially addressed: credential proxy provides destination-based filtering; Docker DOCKER-USER chain rules for containers documented in Emerging Cybersecurity Practices for Agentic AI Applications. Dedicated page still needed.)
- AI-BOM operationalization — beyond CycloneDX ML extension, what’s the actual production workflow? (Partially addressed: AI-BOM: AI Bill of Materials created; runtime AI-BOM via Miggo DeepTracing documented. Full enterprise integration workflow still thin.)
- Per-agent authorization at scale — how does Capability-based authorization (Warrants) interact with traditional RBAC/ABAC at enterprise scale?
- Detection vs. prevention split for prompt injection — Prompt Injection Containment for Agentic Systems now documents the two-layer model: input detection (PromptGuard 2, 90% reduction) + execution containment. The honest answer: detection reduces attack success but cannot eliminate it; containment limits blast radius when detection fails. Platform-layer controls are closing the prevention gap for high-risk tier actions via HITL confirmation.
- Data poisoning and RAG poisoning defenses — entirely undocumented in this wiki as a dedicated practice page. Three incidents + Agentic AI Security Capability Maturity Model — A 2026 Practical Proposal D6 Data, Memory & RAG reference these (and the canonical CMM at L4 calls for RAGShield/TrustRAG-class document attestation, memory-poisoning detector, PoisonedRAG defense via DRS or sentinel-strategist arch) but no practice page exists. Candidate:
wiki/practices/data-poisoning-defenses.md.- Emergent multi-agent behaviors — ASI07 (Insecure Inter-Agent Comms), ASI08 (Cascading Failures), ASI10 (Rogue Agents) have no traditional equivalent. Partially addressed 2026-05-02: see Multi-Agent Runtime Security for cascade-detection / behavioral-baseline / inter-agent IR depth, and A2A Protocol for the v1.0.0 spec analysis. Honest read: 2026 is still the academic-prototype era for cascade detection at scale.