AI Agents Are Here. So Are the Threats. (Unit 42, 2025-05-01)

Source: Unit 42 — AI Agents Are Here. So Are the Threats. (2025-05-01) by Jay Chen and Royce Lu. Local copy: .raw/articles/agentic-ai-threats-unit42-2025-05-01.md. Reference implementation: PaloAltoNetworks/stock_advisory_assistant.

Key Claim

Most agentic-AI vulnerabilities are framework-agnostic. Nine attack scenarios — covering information leakage, credential theft, tool exploitation, and remote code execution — succeed against functionally identical agent applications built on two different popular frameworks (CrewAI and AutoGen). The vulnerabilities arise from insecure design patterns, misconfigurations, and unsafe tool integrations, not from flaws in either framework. Mitigation requires layered defense-in-depth across prompts, runtime, tool integrations, and code execution sandboxes.

Methodology

Built two functionally identical multi-agent investment-advisory applications:

  • Three cooperating agents per implementation: orchestration agent (user interface, task delegation), news agent (search engine + web reader tools), stock agent (database + stock-data + Python code interpreter tools)
  • Identical instructions, models, and tool functionality across both implementations — the only difference is the framework runtime (CrewAI vs AutoGen)
  • Same nine attack payloads executed against each, with framework-specific syntactic adjustments where the attack relies on framework-internal mechanisms (e.g. CrewAI’s delegate vs AutoGen’s transfer_to_*)
  • Reference implementation open-sourced at github.com/PaloAltoNetworks/stock_advisory_assistant for reproducibility, with both vanilla and “reinforced” prompt variants

This is the first published study to systematically isolate framework-agnostic agentic vulnerabilities from framework-specific ones.

The nine attack scenarios

Each scenario maps to one or more OWASP Agentic AI Threats categories. Reproducible via the open-source reference implementation.

#AttackOWASP threatsMitigations
1Identifying participant agents — list all agents in the systemPrompt injection, intent breakingPrompt hardening, content filtering
2Extracting agent instructions — pull system prompts from each agent in the multi-agent groupPrompt injection, intent breaking, agent communication poisoningPrompt hardening, content filtering
3Extracting agent tool schemas — extract tool input/output schemasPrompt injection, intent breaking, agent communication poisoningPrompt hardening, content filtering
4Internal network access via web reader — coerce the news agent’s web-reader tool into fetching internal-network URLsPrompt injection, tool misuse, communication poisoningPrompt hardening, content filtering, tool input sanitization
5Sensitive data exfiltration via mounted volume — read files from the code interpreter’s mounted volumePrompt injection, tool misuse, identity spoofing, RCE, communication poisoningPrompt hardening, code executor sandboxing, content filtering
6Service account access token exfiltration via metadata service — exfiltrate cloud service-account tokens via the metadata endpointPrompt injection, tool misuse, identity spoofing, RCE, communication poisoningPrompt hardening, code executor sandboxing, content filtering
7aSQL injection via database tool — exfiltrate full database tablesPrompt injection, tool misuse, communication poisoningPrompt hardening, tool input sanitization, tool vulnerability scanning, content filtering
7bBOLA (Broken Object-Level Authorization) via database tool — access another user’s data by manipulating object referencesPrompt injection, tool misuse, communication poisoningTool vulnerability scanning
8Indirect prompt injection via webpage — leak the conversation history to an attacker via a malicious webpage retrieved by the news agentPrompt injection, tool misuse, communication poisoningPrompt hardening, content filtering

The pattern is consistent: all nine scenarios work the same way on both CrewAI and AutoGen with only minor syntactic adjustments to the attack payloads.

The five mitigation strategies (defense-in-depth)

Unit 42 explicitly states: no single mitigation is sufficient.

MitigationWhat it coversWhat it doesn’t
Prompt hardening — narrowly-defined responsibilities; explicit prohibitions on disclosing instructions, coworkers, or tool schemas; constrained tool invocationsRaises the bar for prompt-only attacksBypassed by advanced injection techniques; insufficient on its own
Content filtering — inline inspection and blocking of agent inputs/outputs at runtime; vendor reference: Palo Alto Prisma AIRSTool schema extraction, tool misuse, memory manipulation, malicious code execution, sensitive data leakage, malicious URLsLatency / cost; false positives / negatives
Tool input sanitization — type, format, range, and special-character validation at the tool boundarySQL injection, command injection, SSRF via tool inputsDoesn’t catch logic flaws in the tool itself
Tool vulnerability scanning — SAST + DAST + SCA on every tool integrated into the agentic systemBOLA, dependency CVEs, insecure logicDoesn’t catch runtime misconfigurations
Code executor sandboxing — strict container runtime controls: restrict networking, limit mounted volumes (use tmpfs), drop Linux capabilities (CAP_NET_RAW, CAP_SYS_MODULE, CAP_SYS_ADMIN), block syscalls (kexec_load, mount, unmount, iopl, bpf), enforce resource quotasMounted-volume reads, metadata-service exfiltration, sandbox escape, cryptojackingDefault container configurations are insufficient — must be hardened

These five strategies map directly onto the wiki’s RA planes (Control / Runtime / Egress / Data / Observability) and the CMM’s D3 / D4 / D5 / D6 / D7 domains.

Notable Findings

  • Framework-agnostic vulnerabilities dominate. Nine identical attacks succeed across two architecturally distinct agent frameworks. The implication for the CMM: L4 multi-tool red-team coverage must include framework-portable test suites, not framework-specific ones. The reference implementation provides exactly such a suite.
  • Multi-agent communication poisoning is load-bearing. 8 of 9 scenarios involve agent-to-agent communication poisoning as a contributing threat — the orchestrator’s delegate (CrewAI) or transfer_to_* (AutoGen) mechanism is the weak link that lets attackers reach internal agents through the user-facing orchestrator. This empirically confirms ASI07 / ASI08 / ASI10 (multi-agent threats from OWASP Agentic AI Top 10) as production-relevant, not just theoretical.
  • Code executor is the highest-impact attack surface. Two of the nine scenarios (mounted volume, metadata service) become possible only because the stock agent has a code-interpreter tool. The code interpreter elevates impact from “information leakage” to “credential theft + cloud takeover.” This validates the wiki’s agent-sandboxing practice page as a load-bearing CMM D4 control.
  • Tool boundaries are real boundaries. SQL injection (#7a) and BOLA (#7b) are not new vulnerability classes — they’re classic application-security flaws inherited by agent systems through tool integration. Tools that were “safe” when called by traditional code become exploitable when called by an LLM that can be prompted to construct adversarial inputs. This justifies the supply-chain domain’s emphasis on scanning every tool integrated into the agentic system.
  • Indirect prompt injection works as advertised. Scenario #8 (conversation-history exfiltration via malicious webpage retrieved by the news agent) is a live demonstration of the indirect prompt injection attack class. Combined with the sister Unit 42 production-telemetry piece (March 2026), Unit 42 has now published both lab evidence (this article, May 2025) and production evidence (March 2026) for the same attack class.
  • Prisma AIRS is positioned as the runtime mitigation. The article names Palo Alto Prisma AIRS as the recommended runtime content-filter that catches tool schema extraction, tool misuse, memory manipulation, malicious code execution, sensitive data leakage, and malicious URLs. This is consistent with how Prisma AIRS is positioned in the wiki’s RA (Runtime + Egress + Data + Observability planes).

Strengths and Weaknesses

Strengths:

  • Reproducible. The open-source reference implementation lets any team reproduce all nine attacks and test their own defenses.
  • Framework-agnostic by construction. The two-framework methodology is the cleanest possible isolation of “framework flaw” vs “design flaw” — the two implementations share everything except the framework runtime.
  • Defense-in-depth honest about limits. Unit 42 explicitly says no single mitigation is sufficient and shows the per-attack mitigation table that documents which strategies cover which scenarios.
  • Cloud-specific attacks included. Scenarios #5 (mounted volume) and #6 (metadata service) are cloud-deployment-specific and reflect realistic attack paths against production agentic apps deployed on EC2 / GCE / Azure VM-class infrastructure.
  • Date. This is May 2025 work — among the earliest systematic empirical studies of agentic-AI vulnerabilities. Predates [un]prompted March 2026 by ~10 months.

Weaknesses:

  • Vendor positioning. The article concludes by recommending Palo Alto Networks’ Prisma AIRS as the canonical runtime mitigation. Substantively reasonable (Prisma AIRS does ship the relevant content-filter capabilities) but the reader should triangulate with vendor-neutral solutions (LlamaFirewall, Lakera Guard, NeMo Guardrails) before purchasing.
  • No multi-LLM-model evaluation. All nine attacks are demonstrated against a single (unstated) underlying LLM. The Salesforce Rittinghouse and Stripe Bullen talks both show that attack success rate (ASR) varies meaningfully across model generations; the article doesn’t discuss this.
  • No “reinforced prompts” empirical comparison. The reference implementation includes both vanilla and “reinforced” prompt variants but the article doesn’t quantify how much harder reinforced prompts make the attacks. This would be the empirical bound on how far “prompt hardening” alone can take you.
  • Scope limited to single-tenant. The investment-advisory app is a single-tenant prototype; multi-tenant attack surfaces (cross-tenant agent contamination, tenant-id confusion) are not in scope.

In the RA / CMM

  • CMM D7 L4 (multi-tool red-team coverage) — the open-source reference implementation is a candidate for the regression suite slot alongside Promptfoo, complementing PyRIT (orchestration) + Garak (probe library) + Mindgard CART (continuous CART). It’s framework-aware in a way the others aren’t.
  • CMM D4 (Runtime & Guardrails) — empirically validates content filtering (L4) and code executor sandboxing (L5 — cited explicitly in the article’s mitigation 5).
  • CMM D5 (Egress) — the metadata-service-token-exfiltration scenario (#6) is exactly the kind of cloud-instance-specific egress that Smokescreen-class controls were designed for; SSRF closure to internal IPs / metadata endpoints is a D5 baseline.
  • CMM D6 (Supply Chain) — tool vulnerability scanning (mitigation 4) is the article’s framing of the supply-chain control.
  • RA Control plane — agent-to-agent delegation requires explicit policy (per Sondera Cedar harness for coding agents); this article’s scenarios 3 demonstrate the consequence of not having such policy in the multi-agent investment app.

Relations