Security Controls for AI Stacks

Question

What security controls exist for agentic AI stacks, where do they live in the stack, and what are the gaps?

Current Position

Working controls converge into six layers. Mature options exist at the identity and observability layers; emerging options at containment and network; material gaps remain at the model and data layers.

Platform-layer over prompt-layer

The strongest practitioner consensus across the ingested sources (AI Security Standards in Q1 2026: Agentic Threats Outpace Frameworks, Securing the Autonomous Future: Trust, Safety, and Reliability of Agentic AI, Emerging Cybersecurity Practices for Agentic AI Applications) is that security controls must now be enforced at the platform layer, not the prompt layer. Prompt-level guardrails are bypassable; platform-level enforcement (input filtering at the broker, egress control, capability-based authorization, sandboxing) is not.

Evidence from Emerging Cybersecurity Practices for Agentic AI Applications: the APort Agent Guardrail case study articulates this precisely — “pre-action authorization must run in the runtime/platform, so the platform invokes the guardrail for every tool call regardless of what the model outputs.” This mirrors a fundamental security principle: controls must be enforced at a layer below the layer they protect. Network firewalls enforce below the application; tool call interception must enforce below the LLM.

The Six Layers

1. Identity layer (mature)

What it does: assigns verifiable identities to AI agents and traces actions back to invoking humans.

Control	Page	Status
Workload identity (SPIFFE/SPIRE)	SPIRE	Stub — adoption growing
Non-Human Identity (NHI) governance	Non-Human Identity, NHI Governance for Agents	Developing
Reference architecture	AI Agent Identity Architecture	Developing
Credential lifecycle (Credential Zero)	NHI Governance for Agents	Developing
Capability-based authorization (Warrants)	Task-scoped, signed, ephemeral capability authorizations	Developing
Credential proxy (proxy token, vault injection)	Credential Proxy Pattern	Developing — multi-tool convergence confirms this is load-bearing

Verdict: mature options exist; integration discipline (action-to-identity tracing) is what distinguishes well-postured organizations from exposed ones. The Credential Proxy Pattern for AI Agents has independently converged across 5+ tools in the OpenClaw ecosystem — a strong signal that credential exposure (not just credential rotation) is a distinct control gap. See Emerging Cybersecurity Practices for Agentic AI Applications.

2. Observability layer (mature)

What it does: turns the agent from a black box into a glass box; traces reasoning, tool calls, and identities.

Control	Page	Status
Lifecycle hooks + reference monitors	Agent Observability §1	Developing
OpenTelemetry `gen_ai.*` semantic conventions	Agent Observability §2	Developing
Identity multiplexing in logs	Agent Observability §3	Developing
Cedar policy for action mediation	Agent Observability §4A	Developing
Agent Cards (System of Record)	Agent Observability §4C	Developing
Context-aware trimming with pinned tags	Agent Observability §5	Developing
Agent behavioral monitoring (anomaly detection)	Agent Observability §7	Developing

Verdict: mature catalogue. Practitioner consensus is strong. Tooling exists (OTel gen_ai.* conventions are real). New from Emerging Cybersecurity Practices for Agentic AI Applications: Miggo’s AI-BOM runtime discovery with DeepTracing adds a continuous behavioral BOM approach. Cognitive file integrity (monitoring SHA-256 drift on identity files) is a genuinely new extension of traditional FIM to agentic-specific artifacts.

3. Containment layer (emerging)

What it does: prevents an agent from doing damage when other controls fail.

Control	Page	Status
Agent sandboxing (OS-level isolation)	Agent Sandboxing	Developing — last-line-of-defense
Lethal Trifecta containment (egress control, human confirmation, tool annotation)	Stripe (stub — needs full architecture page)	Stub
Human-in-the-Loop primitive	Confirmation gate before high-impact tool calls	Developing
Reversible-actions-only constraint	Tight coupling between AI inference and consequential action is risky; constrain to reversible actions only, with circuit breakers	Developing
Least agency tiers (auto/notify/confirm/block)	Least Agency Principle	Developing — OWASP-sourced
Tool call interception at platform layer	Prompt Injection Containment	Developing
AlignmentCheck (chain-of-thought auditing)	Prompt Injection Containment, LlamaFirewall	Developing
Kill switches / instant revocation	Credential Proxy Pattern	Developing
State rollback (Brain Git)	Supply Chain Security	Developing — agentic-specific

Containment layer gaps — PARTIALLY CLOSED

The Stripe Lethal Trifecta architecture is widely cited but not yet documented in the wiki as a standalone reference architecture. Worth elevating from the Stripe stub to a wiki/architectures/lethal-trifecta-containment.md page.

2026-04-30 update: Emerging Cybersecurity Practices for Agentic AI Applications provides substantial containment layer content. The Least Agency Principle page now documents the OWASP four-tier autonomy classification. The Prompt Injection Containment for Agentic Systems page covers tool call interception, AlignmentCheck, and platform-level vs. prompt-level enforcement. LlamaFirewall’s AlignmentCheck (chain-of-thought auditing) is a novel containment primitive not present in earlier sources. Kill switch / instant revocation and Brain Git rollback are documented in Supply Chain Security for Agentic AI. The Lethal Trifecta architecture page remains the one unfilled stub.

4. Network / protocol layer (emerging)

What it does: secures inter-agent and agent-to-tool communication.

Control	Page	Status
MCP Security taxonomy (CoSAI WS4)	MCP Security, CoSAI	Developing
AgentGateway (open-source MCP gateway)	AgentGateway	Stub
A2A Protocol with Agent Cards / opacity principle	A2A Protocol	Stub
Egress control patterns	(no dedicated page yet)	Gap — see update note
Agent-to-agent cryptographic identity (Ed25519)	Agent Identity Architecture, Emerging Practices §2.4	Developing — Oktsec implementation
Multi-rule content scanning on inter-agent messages (Oktsec ships 268 rules at v0.15.2 — see A2A page)	Emerging Practices §2.4	Developing — Oktsec

Egress control patterns — PARTIALLY ADDRESSED

Multiple sources reference “controlled egress” as a Lethal Trifecta containment primitive. A dedicated wiki page wiki/practices/egress-control-patterns.md is still warranted.

2026-04-30 update: Emerging Cybersecurity Practices for Agentic AI Applications documents the credential proxy pattern as a practical egress control — the proxy validates outbound requests against allowed targets before injecting credentials, providing destination-based egress filtering. Docker DOCKER-USER chain rules for container networking and network segmentation for agent runtimes are also documented in Section 2.7. The dedicated egress-control-patterns page remains a candidate for a future working session.

5. Model layer (research-stage / gap)

What it does: detects malicious intent or deception inside the model itself.

Control	Page	Status
LlamaFirewall (open-source guardrail)	LlamaFirewall	Developing — now substantive, not just a stub
Mechanistic interpretability for “internal EDR”	Agent Observability §6	Developing — research-stage
Prompt-level guardrails	(no dedicated page; widely seen as bypassable)	Acknowledged-weak
Prompt injection detection / input filtering	Prompt Injection Containment §Layer 1	Developing
Proof-of-Guardrail (TEE attestation)	Emerging Practices §2.5	Research-stage — novel primitive

Model-layer is the weakest link

Prompt-level guardrails are not robust controls. Platform-layer enforcement is the practical answer. Mechanistic interpretability is promising but research-stage. The honest read is: the model layer offers detection, not prevention — and detection itself is unreliable.

2026-04-30 update from Emerging Cybersecurity Practices for Agentic AI Applications: LlamaFirewall now has three substantive components — PromptGuard 2 (injection detection, 90% attack reduction), AlignmentCheck (chain-of-thought auditing for goal hijacking, a novel prospective control), and CodeShield (static analysis for generated code). Proof-of-Guardrail using AWS Nitro Enclaves to cryptographically attest that guardrails executed is a genuinely novel primitive that moves from “trust the vendor” to “verify the guardrail ran.” Still research-stage but a promising direction.

6. Data layer (gap)

What it does: protects training data, RAG sources, and model artifacts; provides supply-chain assurance.

Control	Page	Status
AI-BOM / ML-BOM	AI-BOM	Developing — gap closed
Supply-chain multi-layer defense	Supply Chain Security for Agents	Developing — gap closed
Supply-chain attack disclosure	ClawHavoc, SANDWORM_MODE, LiteLLM	Incident pages
Cognitive file integrity (SOUL.md / IDENTITY.md)	Supply Chain Security §Layer 4	Developing — new category
Data poisoning defenses	(no dedicated page)	Gap — still open
RAG poisoning defenses	(no dedicated page)	Gap — still open

Data layer — PARTIALLY CLOSED

Three Q1 2026 incidents are pure supply-chain (ClawHavoc — Agentic Skill Marketplace Supply Chain Attack, SANDWORM_MODE npm worm — AI Toolchain Poisoning, LiteLLM Supply Chain Compromise (Google ADK Dependency)) — ML-BOM adoption lags 48% behind SBOM (per AI Security Standards in Q1 2026: Agentic Threats Outpace Frameworks).

2026-04-30 update: AI-BOM: AI Bill of Materials page now documents the AI Bill of Materials control (static + runtime, CycloneDX format, Miggo runtime discovery pattern). Supply Chain Security for Agentic AI covers the multi-layer supply chain defense model (registry scanning → pre-install scanning → checksum verification → cognitive file integrity → behavioral drift detection). Cognitive file integrity (SHA-256 monitoring of SOUL.md, IDENTITY.md, Brain Git rollback) is a new agentic-specific category unique to this source.

Still open: data poisoning defenses and RAG poisoning defenses have no dedicated pages. These require a separate source or working session. Candidate: wiki/practices/data-poisoning-defenses.md.

What the Frameworks Cover (and Don’t)

Framework	Layers covered well	Layers under-covered
NIST AI RMF	Governance overlay; risk-management process	Technical controls per layer
MITRE ATLAS	Threat taxonomy (attacker perspective)	Defender controls
OWASP LLM Top 10	Model layer awareness	Identity, network, data
OWASP Agentic AI Top 10	Agent orchestration risks	Implementation guidance
IEC 42001	Management system / governance	Technical security controls
Google SAIF	Lifecycle conceptual model	Concrete operational controls
CoSAI (MCP white paper, secure-by-design principles)	Network/protocol layer (MCP); secure-by-design conceptual	Reference implementations
ZT4AI	700+ controls — most comprehensive	Microsoft-stack-specific
ATF	Cross-layer threat model + autonomy promotion gates	Operational tooling

How This Has Evolved

2026-04-30 (round 1) — initial synthesis. Layers identified from three ingested sources. Material gaps flagged. Reference architecture stub created.
2026-04-30 (round 2) — ingested Emerging Cybersecurity Practices for Agentic AI Applications. Significant gap closure:
- Identity layer: Credential Proxy Pattern for AI Agents documented with 5-tool convergence evidence.
- Observability layer: AI-BOM runtime discovery (Miggo DeepTracing), cognitive file integrity added.
- Containment layer: Least Agency Principle (OWASP four-tier autonomy model), Prompt Injection Containment for Agentic Systems with AlignmentCheck. LlamaFirewall stub elevated to substantive page content.
- Data layer: AI-BOM: AI Bill of Materials and Supply Chain Security for Agentic AI pages created. RAG poisoning + data poisoning defenses remain open gaps.
- Network/protocol layer: Ed25519 agent-to-agent signing (Oktsec; vendor-side, not in A2A v1.0 spec), Oktsec content scanning (268 rules at v0.15.2) added.
(next) — Lethal Trifecta architecture page, egress control patterns, data poisoning defenses.

Open Sub-Questions

Move to gaps/

Egress control patterns — what specific egress mechanisms work? OPA/Cedar at the broker? A separate egress proxy? Network-segment isolation? (Partially addressed: credential proxy provides destination-based filtering; Docker DOCKER-USER chain rules for containers documented in Emerging Cybersecurity Practices for Agentic AI Applications. Dedicated page still needed.)

AI-BOM operationalization — beyond CycloneDX ML extension, what’s the actual production workflow? (Partially addressed: AI-BOM: AI Bill of Materials created; runtime AI-BOM via Miggo DeepTracing documented. Full enterprise integration workflow still thin.)

Per-agent authorization at scale — how does Capability-based authorization (Warrants) interact with traditional RBAC/ABAC at enterprise scale?

Detection vs. prevention split for prompt injection — Prompt Injection Containment for Agentic Systems now documents the two-layer model: input detection (PromptGuard 2, 90% reduction) + execution containment. The honest answer: detection reduces attack success but cannot eliminate it; containment limits blast radius when detection fails. Platform-layer controls are closing the prevention gap for high-risk tier actions via HITL confirmation.

Data poisoning and RAG poisoning defenses — entirely undocumented in this wiki as a dedicated practice page. Three incidents + Agentic AI Security Capability Maturity Model — A 2026 Practical Proposal D6 Data, Memory & RAG reference these (and the canonical CMM at L4 calls for RAGShield/TrustRAG-class document attestation, memory-poisoning detector, PoisonedRAG defense via DRS or sentinel-strategist arch) but no practice page exists. Candidate: wiki/practices/data-poisoning-defenses.md.

Emergent multi-agent behaviors — ASI07 (Insecure Inter-Agent Comms), ASI08 (Cascading Failures), ASI10 (Rogue Agents) have no traditional equivalent. Partially addressed 2026-05-02: see Multi-Agent Runtime Security for cascade-detection / behavioral-baseline / inter-agent IR depth, and A2A Protocol for the v1.0.0 spec analysis. Honest read: 2026 is still the academic-prototype era for cascade detection at scale.

Enterprise Security in the Agentic AI Era

Explorer

Security Controls for AI Stacks

Security Controls for AI Stacks

Question

Current Position

The Six Layers

1. Identity layer (mature)

2. Observability layer (mature)

3. Containment layer (emerging)

4. Network / protocol layer (emerging)

5. Model layer (research-stage / gap)

6. Data layer (gap)

What the Frameworks Cover (and Don’t)

How This Has Evolved

Open Sub-Questions

Graph View

Table of Contents

Backlinks