Security Controls for AI Stacks

Question

What security controls exist for agentic AI stacks, where do they live in the stack, and what are the gaps?

Current Position

Working controls converge into six layers. Mature options exist at the identity and observability layers; emerging options at containment and network; material gaps remain at the model and data layers.

Platform-layer over prompt-layer

The strongest practitioner consensus across the ingested sources (AI Security Standards in Q1 2026: Agentic Threats Outpace Frameworks, Securing the Autonomous Future: Trust, Safety, and Reliability of Agentic AI, Emerging Cybersecurity Practices for Agentic AI Applications) is that security controls must now be enforced at the platform layer, not the prompt layer. Prompt-level guardrails are bypassable; platform-level enforcement (input filtering at the broker, egress control, capability-based authorization, sandboxing) is not.

Evidence from Emerging Cybersecurity Practices for Agentic AI Applications: the APort Agent Guardrail case study articulates this precisely — “pre-action authorization must run in the runtime/platform, so the platform invokes the guardrail for every tool call regardless of what the model outputs.” This mirrors a fundamental security principle: controls must be enforced at a layer below the layer they protect. Network firewalls enforce below the application; tool call interception must enforce below the LLM.

The Six Layers

1. Identity layer (mature)

What it does: assigns verifiable identities to AI agents and traces actions back to invoking humans.

ControlPageStatus
Workload identity (SPIFFE/SPIRE) SPIREStub — adoption growing
Non-Human Identity (NHI) governanceNon-Human Identity, NHI Governance for AgentsDeveloping
Reference architectureAI Agent Identity ArchitectureDeveloping
Credential lifecycle (Credential Zero)NHI Governance for AgentsDeveloping
Capability-based authorization (Warrants)Task-scoped, signed, ephemeral capability authorizationsDeveloping
Credential proxy (proxy token, vault injection)Credential Proxy PatternDeveloping — multi-tool convergence confirms this is load-bearing

Verdict: mature options exist; integration discipline (action-to-identity tracing) is what distinguishes well-postured organizations from exposed ones. The Credential Proxy Pattern for AI Agents has independently converged across 5+ tools in the OpenClaw ecosystem — a strong signal that credential exposure (not just credential rotation) is a distinct control gap. See Emerging Cybersecurity Practices for Agentic AI Applications.

2. Observability layer (mature)

What it does: turns the agent from a black box into a glass box; traces reasoning, tool calls, and identities.

ControlPageStatus
Lifecycle hooks + reference monitorsAgent Observability §1Developing
OpenTelemetry gen_ai.* semantic conventionsAgent Observability §2Developing
Identity multiplexing in logsAgent Observability §3Developing
Cedar policy for action mediationAgent Observability §4ADeveloping
Agent Cards (System of Record)Agent Observability §4CDeveloping
Context-aware trimming with pinned tagsAgent Observability §5Developing
Agent behavioral monitoring (anomaly detection)Agent Observability §7Developing

| AI-BOM runtime discovery (behavioral baseline) | AI-BOM — Miggo DeepTracing pattern | Developing | | Behavioral drift detection | Agent Observability §7, AI-BOM §Runtime | Developing | | Cognitive file integrity monitoring | Supply Chain Security | Developing — extends FIM to SOUL.md / IDENTITY.md |

Verdict: mature catalogue. Practitioner consensus is strong. Tooling exists (OTel gen_ai.* conventions are real). New from Emerging Cybersecurity Practices for Agentic AI Applications: Miggo’s AI-BOM runtime discovery with DeepTracing adds a continuous behavioral BOM approach. Cognitive file integrity (monitoring SHA-256 drift on identity files) is a genuinely new extension of traditional FIM to agentic-specific artifacts.

3. Containment layer (emerging)

What it does: prevents an agent from doing damage when other controls fail.

ControlPageStatus
Agent sandboxing (OS-level isolation)Agent SandboxingDeveloping — last-line-of-defense
Lethal Trifecta containment (egress control, human confirmation, tool annotation)Stripe (stub — needs full architecture page)Stub
Human-in-the-Loop primitiveConfirmation gate before high-impact tool callsDeveloping
Reversible-actions-only constraintTight coupling between AI inference and consequential action is risky; constrain to reversible actions only, with circuit breakersDeveloping
Least agency tiers (auto/notify/confirm/block)Least Agency PrincipleDeveloping — OWASP-sourced
Tool call interception at platform layerPrompt Injection ContainmentDeveloping
AlignmentCheck (chain-of-thought auditing)Prompt Injection Containment, LlamaFirewallDeveloping
Kill switches / instant revocationCredential Proxy PatternDeveloping
State rollback (Brain Git)Supply Chain SecurityDeveloping — agentic-specific

Containment layer gaps — PARTIALLY CLOSED

The Stripe Lethal Trifecta architecture is widely cited but not yet documented in the wiki as a standalone reference architecture. Worth elevating from the Stripe stub to a wiki/architectures/lethal-trifecta-containment.md page.

2026-04-30 update: Emerging Cybersecurity Practices for Agentic AI Applications provides substantial containment layer content. The Least Agency Principle page now documents the OWASP four-tier autonomy classification. The Prompt Injection Containment for Agentic Systems page covers tool call interception, AlignmentCheck, and platform-level vs. prompt-level enforcement. LlamaFirewall’s AlignmentCheck (chain-of-thought auditing) is a novel containment primitive not present in earlier sources. Kill switch / instant revocation and Brain Git rollback are documented in Supply Chain Security for Agentic AI. The Lethal Trifecta architecture page remains the one unfilled stub.

4. Network / protocol layer (emerging)

What it does: secures inter-agent and agent-to-tool communication.

ControlPageStatus
MCP Security taxonomy (CoSAI WS4)MCP Security, CoSAIDeveloping
AgentGateway (open-source MCP gateway)AgentGatewayStub
A2A Protocol with Agent Cards / opacity principleA2A ProtocolStub
Egress control patterns(no dedicated page yet)Gap — see update note
Agent-to-agent cryptographic identity (Ed25519)Agent Identity Architecture, Emerging Practices §2.4Developing — Oktsec implementation
Multi-rule content scanning on inter-agent messages (Oktsec ships 268 rules at v0.15.2 — see A2A page)Emerging Practices §2.4Developing — Oktsec

Egress control patterns — PARTIALLY ADDRESSED

Multiple sources reference “controlled egress” as a Lethal Trifecta containment primitive. A dedicated wiki page wiki/practices/egress-control-patterns.md is still warranted.

2026-04-30 update: Emerging Cybersecurity Practices for Agentic AI Applications documents the credential proxy pattern as a practical egress control — the proxy validates outbound requests against allowed targets before injecting credentials, providing destination-based egress filtering. Docker DOCKER-USER chain rules for container networking and network segmentation for agent runtimes are also documented in Section 2.7. The dedicated egress-control-patterns page remains a candidate for a future working session.

5. Model layer (research-stage / gap)

What it does: detects malicious intent or deception inside the model itself.

ControlPageStatus
LlamaFirewall (open-source guardrail)LlamaFirewallDeveloping — now substantive, not just a stub
Mechanistic interpretability for “internal EDR”Agent Observability §6Developing — research-stage
Prompt-level guardrails(no dedicated page; widely seen as bypassable)Acknowledged-weak
Prompt injection detection / input filteringPrompt Injection Containment §Layer 1Developing
Proof-of-Guardrail (TEE attestation)Emerging Practices §2.5Research-stage — novel primitive

Model-layer is the weakest link

Prompt-level guardrails are not robust controls. Platform-layer enforcement is the practical answer. Mechanistic interpretability is promising but research-stage. The honest read is: the model layer offers detection, not prevention — and detection itself is unreliable.

2026-04-30 update from Emerging Cybersecurity Practices for Agentic AI Applications: LlamaFirewall now has three substantive components — PromptGuard 2 (injection detection, 90% attack reduction), AlignmentCheck (chain-of-thought auditing for goal hijacking, a novel prospective control), and CodeShield (static analysis for generated code). Proof-of-Guardrail using AWS Nitro Enclaves to cryptographically attest that guardrails executed is a genuinely novel primitive that moves from “trust the vendor” to “verify the guardrail ran.” Still research-stage but a promising direction.

6. Data layer (gap)

What it does: protects training data, RAG sources, and model artifacts; provides supply-chain assurance.

ControlPageStatus
AI-BOM / ML-BOMAI-BOMDeveloping — gap closed
Supply-chain multi-layer defenseSupply Chain Security for AgentsDeveloping — gap closed
Supply-chain attack disclosureClawHavoc, SANDWORM_MODE, LiteLLMIncident pages
Cognitive file integrity (SOUL.md / IDENTITY.md)Supply Chain Security §Layer 4Developing — new category
Data poisoning defenses(no dedicated page)Gap — still open
RAG poisoning defenses(no dedicated page)Gap — still open

Data layer — PARTIALLY CLOSED

Three Q1 2026 incidents are pure supply-chain (ClawHavoc — Agentic Skill Marketplace Supply Chain Attack, SANDWORM_MODE npm worm — AI Toolchain Poisoning, LiteLLM Supply Chain Compromise (Google ADK Dependency)) — ML-BOM adoption lags 48% behind SBOM (per AI Security Standards in Q1 2026: Agentic Threats Outpace Frameworks).

2026-04-30 update: AI-BOM: AI Bill of Materials page now documents the AI Bill of Materials control (static + runtime, CycloneDX format, Miggo runtime discovery pattern). Supply Chain Security for Agentic AI covers the multi-layer supply chain defense model (registry scanning → pre-install scanning → checksum verification → cognitive file integrity → behavioral drift detection). Cognitive file integrity (SHA-256 monitoring of SOUL.md, IDENTITY.md, Brain Git rollback) is a new agentic-specific category unique to this source.

Still open: data poisoning defenses and RAG poisoning defenses have no dedicated pages. These require a separate source or working session. Candidate: wiki/practices/data-poisoning-defenses.md.

What the Frameworks Cover (and Don’t)

FrameworkLayers covered wellLayers under-covered
NIST AI RMFGovernance overlay; risk-management processTechnical controls per layer
MITRE ATLASThreat taxonomy (attacker perspective)Defender controls
OWASP LLM Top 10Model layer awarenessIdentity, network, data
OWASP Agentic AI Top 10Agent orchestration risksImplementation guidance
IEC 42001Management system / governanceTechnical security controls
Google SAIFLifecycle conceptual modelConcrete operational controls
CoSAI (MCP white paper, secure-by-design principles)Network/protocol layer (MCP); secure-by-design conceptualReference implementations
ZT4AI700+ controls — most comprehensiveMicrosoft-stack-specific
ATFCross-layer threat model + autonomy promotion gatesOperational tooling

How This Has Evolved

Open Sub-Questions

Move to gaps/

  1. Egress control patterns — what specific egress mechanisms work? OPA/Cedar at the broker? A separate egress proxy? Network-segment isolation? (Partially addressed: credential proxy provides destination-based filtering; Docker DOCKER-USER chain rules for containers documented in Emerging Cybersecurity Practices for Agentic AI Applications. Dedicated page still needed.)
  2. AI-BOM operationalization — beyond CycloneDX ML extension, what’s the actual production workflow? (Partially addressed: AI-BOM: AI Bill of Materials created; runtime AI-BOM via Miggo DeepTracing documented. Full enterprise integration workflow still thin.)
  3. Per-agent authorization at scale — how does Capability-based authorization (Warrants) interact with traditional RBAC/ABAC at enterprise scale?
  4. Detection vs. prevention split for prompt injectionPrompt Injection Containment for Agentic Systems now documents the two-layer model: input detection (PromptGuard 2, 90% reduction) + execution containment. The honest answer: detection reduces attack success but cannot eliminate it; containment limits blast radius when detection fails. Platform-layer controls are closing the prevention gap for high-risk tier actions via HITL confirmation.
  5. Data poisoning and RAG poisoning defenses — entirely undocumented in this wiki as a dedicated practice page. Three incidents + Agentic AI Security Capability Maturity Model — A 2026 Practical Proposal D6 Data, Memory & RAG reference these (and the canonical CMM at L4 calls for RAGShield/TrustRAG-class document attestation, memory-poisoning detector, PoisonedRAG defense via DRS or sentinel-strategist arch) but no practice page exists. Candidate: wiki/practices/data-poisoning-defenses.md.
  6. Emergent multi-agent behaviors — ASI07 (Insecure Inter-Agent Comms), ASI08 (Cascading Failures), ASI10 (Rogue Agents) have no traditional equivalent. Partially addressed 2026-05-02: see Multi-Agent Runtime Security for cascade-detection / behavioral-baseline / inter-agent IR depth, and A2A Protocol for the v1.0.0 spec analysis. Honest read: 2026 is still the academic-prototype era for cascade detection at scale.