Inline Gateway vs Runtime Instrumentation

What

Two architectural strategies for enforcing security policy on agentic AI in production. Both ship at the seed stage of the agentic-AI-security market in Q4 2025 / Q1 2026 — the funded category has effectively split into two camps with the same security goals and incompatible deployment models.

Dimension	Inline Gateway / Proxy	Runtime Instrumentation
Sits where	In the data path between agent and tool/MCP/network	Inside the agent runtime, hooked into tool-call or syscall surfaces
Visibility	Every request that goes through (full audit trail)	Every action the agent attempts (full action trail)
Enforcement	Block / rewrite / quarantine inline	Block / pause / require confirmation at action time
Deployment	Network change, DNS / routing, mTLS termination	Runtime hook, agent SDK, or platform integration (Cursor, Copilot, Agentforce, etc.)
Latency	Adds at-least-one network hop	Negligible (in-process)
Bypass risk	Low if traffic is forced through, but agents can take direct paths	Lower if the runtime is enforced; higher if the agent can call OS-level paths
Vendor pattern (May 2026)	Runlayer (gateway), Helmet (discovery+monitoring), AgentGateway (OSS), AgentCordon (OSS, gateway+vault+IDP combined), Operant, Natoma, Cloudflare AI Gateway	Capsule (no proxy/gateway/SDK), Miggo (DeepTracing), Lakera Guard (content layer in-process)

Why this is now load-bearing

The seed-cohort startups in the 2025–2026 agentic-AI-security funding wave are explicitly architected against each other. Runlayer and Helmet pitch the MCP gateway as the way to get visibility and control over agent communications. Capsule pitches runtime instrumentation with the explicit anti-gateway claim that proxies, gateways, SDKs, and browser extensions are not required. More seed capital has gone to the gateway camp than to the instrumentation camp, but the architectural choice is being decided in production-deployment outcomes through 2026.

Production evidence: the ADR sensor

The first large-scale, production-proven data point for the instrumentation camp is Uber’s ADR (ten months, 7,200+ hosts, 10,000+ sessions/day). Its ADR Sensor is runtime instrumentation by another route: rather than hooking a live syscall or tool-call surface, it parses the local SQLite/JSONL caches that Cursor, Cline, and Claude Code already write, reconstructing the full prompt → reasoning → tool-call → outcome chain at ~0.182 s per run.¹ The paper makes the architectural argument this page frames explicit, and lands on instrumentation: an LLM/MCP gateway was evaluated and rejected for observability because it requires MCP-host changes, is incompatible with streaming responses, and captures only partial information, omitting environmental context.¹

The nuance worth recording is that ADR is hybrid for enforcement: the sensor does deep forensics (detective, after-the-fact), while inline Hooks in Cursor and Claude Code do real-time blocking of high-severity credential leakage (preventative, in-path).¹ That matches this page’s “defense-in-depth across the fork” recommendation rather than a pure-instrumentation stance — the cheap, certain control (regex+entropy secret-blocking) sits inline at the chokepoint, while the expensive, semantic control (reasoning over the reconstructed session) runs off the hot path.

Partial resolution of the enforceability gap

The gap below asked whether runtime instrumentation is viable as primary detection at scale for real agents. For local-process coding agents (Cursor / Cline / Claude Code), ADR answers yes in production: cache-parsing reconstruction sustains detection across 7,200+ hosts. It does not resolve the hosted-LLM-agent case, and the sensor is detective, not tamper-proof enforcement — a prompt-injected agent calling a path the sensor does not reconstruct remains the open risk.

Historical analogues

Same fork played out a decade ago between API Gateways (Apigee, Kong, Tyk — sit in the path) and Runtime APM (Datadog, New Relic, AppDynamics — instrument in-process). Both categories survived; they coexist in the same enterprise but for different jobs. Gateways won policy enforcement and contracts at the boundary. APM won deep latency-and-error visibility inside services. The agentic-AI-security analog is likely to settle similarly: gateways for policy at the agent/tool boundary; instrumentation for behavioral observability inside the agent runtime.

Where each is the right primitive

Gateway is the right primitive when

The boundary is well-defined (MCP, A2A, tool registry)
The agent population is heterogeneous (the gateway becomes the only common chokepoint)
Policy must be enforced, not advisory (the gateway can deny)
The audit trail needs to live outside the agent runtime for compliance reasons

Runtime instrumentation is the right primitive when

The agent population is homogeneous (Cursor + Copilot + Agentforce — small set of supported runtimes)
Network interception is impractical (managed SaaS, mobile, hosted runtime)
Latency budget is tight
The interesting events happen inside the agent (planning steps, memory writes, code generation), not at the network boundary

Tradeoffs that don’t have a clean answer yet

Gap

Gateway bypass via the Lethal Trifecta’s third leg. Agents that exfiltrate via image rendering, markdown URLs, DNS, or direct browser fetch can route around an MCP-only gateway. Runlayer/Helmet/AgentGateway handle the MCP surface; the rest needs Smokescreen-shaped SSRF/egress control. A gateway-only architecture is necessary-but-not-sufficient.

Gap

Runtime-instrumentation enforceability under hostile model behavior. A misaligned or prompt-injected agent can in principle call APIs directly without going through the instrumented hook. The instrumentation camp’s claim — that the runtime hooks are tight enough to be unbypassable — is unproven in the public literature for hosted-LLM agents (vs. local-process agents). Miggo’s AWS Nitro Enclaves attestation is the closest production approach to making the hook tamper-resistant; this is not yet a category norm.

Gap

Where does identity coupling live? Both camps integrate with Okta / Entra / Keycard for the principal-and-permissions side. The PDP-vs-PEP split (per Oversight Layer) lands differently: gateways naturally are PEPs; runtime instrumentation can be either.

Implication for the CMM

Current CMM D5 (Egress & Network) rows are written gateway-first. A revision should:

Add a parallel D5 evidence row for runtime-instrumentation enforcement — at L3 acceptable as primary enforcement when the agent runtime is homogeneous and tamper-evidence is provided.
At L4, require both gateway-style enforcement at platform boundaries and runtime instrumentation inside the agent runtime — defense-in-depth across the architectural fork rather than choosing one.
At L5, require enforcement attestation (e.g. AWS Nitro Enclaves per Miggo, TPM-backed runtime, or signed gateway logs) to harden against the bypass paths above.

Deepest instrumentation: model forward-pass hooks

The Inline Gateway vs Runtime Instrumentation fork as described above treats the agent runtime as the instrumentation boundary. Carl Hurd’s Glass-Box Security talk at Unprompted March 2026 identifies a third, deeper instrumentation surface: the model’s forward pass itself — hooks into the residual stream at specific layers to capture activation vectors for intent and strength measurement.

This is not in the current seed-cohort product map — no funded startup as of May 2026 appears to ship production forward-pass activation monitoring. It is Starseer’s pre-launch direction and establishes a theoretical stack of:

[gateway/network layer] ← Runlayer, Helmet, AgentGateway
[agent runtime layer]   ← Capsule, Miggo, Lakera in-process
[model forward-pass]    ← Starseer (Glass-Box), future vendors

Each deeper layer provides higher-fidelity intent signals but requires more infrastructure control (self-hosted inference or canary-model instrumentation). See Mechanistic Interpretability for Defense and Glass-Box Security for the conceptual basis.

Sources

arxiv.org

§3.1 Observability: The ADR Sensor, arXiv:2605.17380: cache-parsing reconstruction at 0.182 s/run, the rejected LLM/MCP gateway alternative, and the hybrid sensor-plus-inline-Hooks prevention model. ↩ ↩² ↩³

Enterprise Security in the Agentic AI Era

Explorer

Inline Gateway vs Runtime Instrumentation

Inline Gateway vs Runtime Instrumentation

What

Why this is now load-bearing

Production evidence: the ADR sensor

Historical analogues

Where each is the right primitive

Gateway is the right primitive when

Runtime instrumentation is the right primitive when

Tradeoffs that don’t have a clean answer yet

Implication for the CMM

Deepest instrumentation: model forward-pass hooks

See Also

Sources

Graph View

Table of Contents

Backlinks

Enterprise Security in the Agentic AI Era

Explorer

Inline Gateway vs Runtime Instrumentation

Inline Gateway vs Runtime Instrumentation

What

Why this is now load-bearing

Production evidence: the ADR sensor

Historical analogues

Where each is the right primitive

Gateway is the right primitive when

Runtime instrumentation is the right primitive when

Tradeoffs that don’t have a clean answer yet

Implication for the CMM

Deepest instrumentation: model forward-pass hooks

See Also

Sources

Footnotes

Graph View

Table of Contents

Backlinks