Agent Observability

Improving agent observability requires moving from a “black-box” model, where only final outputs are seen, to a “glass-box” security paradigm that monitors internal reasoning, intent, and tool-use trajectories.

1. Architectural Foundations: Hooks and Reference Monitors

Traditional EDR sees processes, but fails to distinguish if a shell command was typed by a human or spawned by an agent. To bridge this gap, you should implement Lifecycle Hooks and Reference Monitors.

Reference Monitors: These sit outside the agent and model to mediate every event. They must be always invoked, tamper-proof, and verifiable.
Lifecycle Hooks: Major coding tools now expose hooks (e.g., PreToolUse, SessionStart, afterFileEdit) which serve as a direct telemetry pipeline for what EDR cannot see.

2. Standardizing Telemetry with OpenTelemetry (OTel)

To avoid siloed logs, the sources advocate for OpenTelemetry (OTel) to create a standardized lexicon for AI behavior.

Application Monitoring: Use OTel to connect process-level data with semantic intent.
Semantic Conventions: Utilize the gen_ai.* conventions to tag spans with prompt data, model reasoning, and provider information.

3. Identity Multiplexing (The Observability Fix)

Standard logs often separate “User Action” from “Agent Logic,” making it impossible to detect lateral movement or abuse of legitimate agency.

The Fix: Inject botId, sessionContext, and traceId into every execution log.
Goal: Ensure every autonomous action (like an Apex call or shell command) can be traced back to the invoking human user.

4. Helpful Configurations and Code Snippets

A. Cedar Policy for Action Mediation

The Cedar Policy Language can be used to deterministically intercept and forbid dangerous commands based on the context captured by hooks.

// Generated Cedar policy to forbid destructive shell commands
forbid (
    principal,
    action == cursor::Action::"shell_execution",
    resource
)
when {
    context has parameters &&
    (context.parameters.command like "*rm -rf*" ||
     context.parameters.command like "*sudo*")
};

// Forbid access to sensitive files
forbid (
    principal,
    action in [cursor::Action::"file_edit"],
    resource
)
when {
    context has parameters &&
    (context.parameters.file_path like "*.env" ||
     context.parameters.file_path like "*.env.local")
};

[Source: 16]

B. Capability-Based Warrants

Instead of static permissions, use Warrants—cryptographic, task-scoped authorizations that narrow an agent’s blast radius.

# Example Warrant Primitive
warrant:
  action: email.send
  constraints:
    recipients: "*@company.com"
    attachments: 1
    max_size_kb: 500
  ttl: 15m
  holder: agent-47
  signature: 0x8f3a...

[Source: 62]

C. Agent Card Configuration

Define a “System of Record” for every agent using Agent Cards to log personas, allowed capabilities, and PII masking rules.

{
  "agent_id": "agent_hunt_orchestrator",
  "roles": ["Threat Intelligence Engineer", "Incident Responder"],
  "pii_masking": true,
  "allowed_capabilities": [
    "agent_ioc_normalizer",
    "agent_threat_hunt_executor"
  ],
  "max_iterations": 10
}

[Source: 145]

5. Context-Aware Trimming

A common observability failure occurs when a long-running agent fills its context window and drops older, critical security logs.

Implementation: Tag messages by type (e.g., SSRF_BLOCKED, PERMISSION_DENIED).
Configuration: Ensure these specific tags are “pinned” and survive trimming, even as the general log volume grows, so the agent (and forensic investigators) maintains a full history of security events.

5b. Adversarial Prompt Injection Through Attack Data

A practitioner-flagged threat vector: prompt injection delivered through the attack payload itself. When an AI agent ingests attack data (e.g., a SOC pulling in suspicious network traffic, log entries, or malware samples for analysis) and that data contains injected instructions, an autonomous agent acting on its analysis can become an unwitting accomplice to the attacker.

The key risk: tight coupling between AI inference and automated action amplifies the blast radius of adversarial manipulation. Mitigations: reversible actions only, circuit breakers, and no auto-close without explicit human approval. See Indirect Prompt Injection for the broader attack class and Prompt Injection Containment for Agentic Systems for runtime controls.

6. Building “Internal EDR” (Glass-Box pillars)

For advanced threat response, practitioners are moving toward Glass-Box Security using Mechanistic Interpretability — introduced by Carl Hurd (Starseer) at [un]prompted March 2026. See Hurd — Glass-Box Security for the full technique description.

Intent capture: Forward-pass hooks on the model’s residual stream, comparing activation vectors against stored concept-reference directions using cosine similarity. Fires when the model is semantically processing a dangerous concept — not when a dangerous word appears in the input.
Strength measurement: Scalar projection (dot product normalized by total tensor magnitude) measures how dominant the dangerous concept is in the current activation — separating “touches on this topic” from “is overwhelmingly about this topic.”
Sovereign Infrastructure: This level of observability often requires a return to self-hosted infrastructure to gain deep visibility into latent space geometry. For managed-API users, the canary model approach (instrument a smaller open-weight model in parallel) provides partial coverage subject to cross-model activation transfer assumptions.

7. Agent Behavioral Monitoring — Insider-Threat Framing

Full-stack agent monitoring is best framed as an insider-threat problem: AI agents are inherently probabilistic, making it impossible to enumerate every permissible action sequence. A behavioral / anomaly-detection approach — borrowing from User and Entity Behavior Analytics (UEBA) for stable identities — is more effective than purely deterministic ruleset enforcement.

Production benchmark: Salesforce Agentforce

Matt Rittinghouse and Millie Rittinghouse (Salesforce CSOC) reported at [[unprompted-conference-march-2026|[un]prompted March 2026]] that a three-level ensemble behavioral model applied to ~1.8 million daily prompts across 55,000 tenant organizations and 12,000+ unique agents produced fewer than 30 actionable security alerts per day — a prompt-volume-to-alert ratio of approximately 60,000:1. See “1.8M Prompts, 30 Alerts” for the full methodology. This is the first published production-scale signal-to-noise benchmark for agentic AI SOC operations.

The model adds a structurally new detection axis beyond traditional UEBA: agent-level behavioral baseline (what does this specific agent normally do?), combined with user-level and organization-level baselines in an ensemble. See Behavioral Anomaly Detection for Agents for the concept page.

Term provenance

Enterprise CISOs interviewed by Insight Partners (Oct 2025) coined the colloquial label “UEBA for Agents” for this practice. The wiki uses the architecturally neutral terms “agent behavioral monitoring” or “behavioral baselines for agents” instead, because (a) agents are typically ephemeral and lack the persistent identities classical UEBA was built for, and (b) the original UEBA product category had largely merged into SIEM/XDR by 2020. The colloquial label is preserved in Securing the Autonomous Future: Trust, Safety, and Reliability of Agentic AI (the source) and Insight Partners’s entity page; everywhere else the wiki uses the neutral terms. See Peer-Review Readiness — Gaps in the RA + CMM for the attribution audit.

This framing aligns well with the glass-box pillars above and extends them to a production-monitoring posture:

Establish behavioral baselines per agent type and role.
Alert on deviations: unexpected tool calls, unusual data access volumes, calls to external services outside normal profile, unusual MCP server interactions.
Combine with NHI action-to-identity tracing so every anomalous action can be attributed to a specific agent identity and the human (if any) who instructed it.

See Securing the Autonomous Future: Trust, Safety, and Reliability of Agentic AI and AI Agent Identity Architecture for the identity-attribution architecture that feeds this monitoring layer.

8. AI-BOM Runtime Discovery and Behavioral Baselines (New from Emerging Cybersecurity Practices for Agentic AI Applications)

Miggo Security’s Runtime Defense Platform introduces an AI-BOM-centric approach to observability:

AI-BOM Discovery: continuously inventories all AI components running in production (models, frameworks, skills, MCP servers) — a live CMDB for AI artifacts.
DeepTracing: patented technique tracing tool calls, model loading, file access, and network behavior at the execution layer.
Behavioral baselines: establishes per-agent and per-component normal behavior profiles; flags drift with security context (not just metric anomalies).
MCP-aware monitoring: understands MCP protocol semantics, enabling protocol-level anomaly detection rather than generic network traffic analysis.

Key operational insight: agents generate 10–20x the log volume of humans over the same time window. This is not a metrics problem — it requires purpose-built behavioral analysis. Generic SIEM without agentic-aware normalization will be overwhelmed.

9. Nightly Audit Baselines and Memory Integrity

From SecureClaw’s approach: 13 core metrics reported every night, including healthy-state outputs (not just failure alerts). Memory integrity monitoring watches for unauthorized changes to persistent agent state — addressing the scenario where an agent’s behavioral state has been silently modified between sessions.

The SecureClaw design principle: run all detection logic as external bash processes consuming zero LLM tokens. This prevents the monitoring from expanding the attack surface it is protecting against (LLM token-consuming monitors can themselves be prompt-injected).

10. Cognitive File Integrity Monitoring

Traditional FIM (OSSEC, Tripwire, Wazuh) monitors filesystem for unauthorized changes to critical files. For AI agents, this extends to cognitive identity files: SOUL.md, IDENTITY.md, and similar files that define the agent’s behavioral rules, persona, and operational constraints.

Establish SHA-256 baselines at deployment for all cognitive files.
Alert on drift: cognitive file changes without an authorized update event.
Brain Git (SlowMist): version-control all cognitive state files in git, enabling rollback to a known-good behavioral configuration — the agent-equivalent of system restore.

This is the only category in the observability taxonomy with no traditional equivalent — it is genuinely new to agentic systems.

Enterprise Security in the Agentic AI Era

Explorer

Agent Observability

Agent Observability

1. Architectural Foundations: Hooks and Reference Monitors

2. Standardizing Telemetry with OpenTelemetry (OTel)

3. Identity Multiplexing (The Observability Fix)

4. Helpful Configurations and Code Snippets

A. Cedar Policy for Action Mediation

B. Capability-Based Warrants

C. Agent Card Configuration

5. Context-Aware Trimming

5b. Adversarial Prompt Injection Through Attack Data

6. Building “Internal EDR” (Glass-Box pillars)

7. Agent Behavioral Monitoring — Insider-Threat Framing

8. AI-BOM Runtime Discovery and Behavioral Baselines (New from Emerging Cybersecurity Practices for Agentic AI Applications)

9. Nightly Audit Baselines and Memory Integrity

10. Cognitive File Integrity Monitoring

Graph View

Table of Contents

Backlinks