Jules AI Kill Chain — Indirect Injection to Full Remote Control

Summary

In August 2025, Johann Rehberger demonstrated a full kill chain against Google’s Jules coding agent as part of his “Month of AI Bugs” disclosure series. The attack started with an indirect prompt injection hidden in a GitHub issue body, escalated through write access to project files (persistence), exfiltrated source code and credentials, and ended with the agent polling an attacker-controlled endpoint for remote commands — full command-and-control of an agent running on Google infrastructure.

The five-stage chain is one of the cleanest publicly documented end-to-end agent compromises of 2025 and is cited as a canonical case study in the “Securing Your Agents” deck (slide 14).

The Five-Stage Chain

#	Stage	Actor	What happened
1	PLANT	Attacker	Seeds a GitHub issue with hidden prompt injection. Issue body looks like a normal bug report; injection sits in invisible characters / HTML comment / hidden formatting.
2	HIJACK	Agent (compromised)	Jules reads the issue. Injection overrides the agent’s goal. The user’s “investigate this bug” task is silently abandoned in favor of attacker instructions.
3	PERSIST	Agent (under attacker control)	Agent writes attacker-supplied payloads into project files and configuration. Survives session restarts — the attack is now self-sustaining within the repo.
4	EXFILTRATE	Agent (under attacker control)	Agent uses its unrestricted network access to POST source code and discoverable credentials to an external endpoint.
5	CONTROL	Attacker	Full C2: agent polls a remote endpoint for new commands and executes them on Google infrastructure on the attacker’s behalf.

Failure Modes Identified

The compromise required four controls to be missing simultaneously. Removing any one would have broken the chain:

Missing control	What it would have prevented
Input sanitization on retrieved content	The injection in the issue body would have been scanned and quarantined before reaching the model context
Egress filtering	The agent could POST data to any endpoint; an outbound-domain allowlist would have blocked the exfiltration step
Human-in-the-loop on file writes / network calls	High-risk actions (writing to project files, outbound HTTP) executed without confirmation; see Least Agency Principle
Anomaly detection	The behavioral shift from “investigating a bug” to “exfiltrating source code” went unobserved

This is the operational instantiation of the Lethal Trifecta — Jules had access to private data (the repository), exposure to untrusted content (the GitHub issue), and external communication (network access). All three legs present, no controls breaking the trifecta.

Why This Matters

Coding agents are an exceptionally exposed class. They have read/write access to source code, persistent file system access, network access, and they routinely ingest text from issues, PR bodies, commit messages, and READMEs — all attacker-influenceable.
Persistence in agent compromise is novel. Unlike traditional RCE that needs a foothold mechanism, the agent itself wrote the persistence into the repo on the attacker’s instruction. The next session of Jules (or any other agent that reads those files) inherits the compromise.
The user surface looked normal throughout. No alerts, no warnings. The user asked for a bug investigation; the response they eventually saw appeared on-task. The compromise was invisible without log inspection.
A frontier-vendor agent failed. This was not a small startup’s prototype; it was Google’s coding agent on Google infrastructure. Frontier engineering does not equal frontier security.

Mapping to Frameworks

OWASP ASI01 — Agent Goal Hijack (Stage 2)
OWASP ASI02 — Tool Misuse & Exploitation (Stage 3, Stage 4)
OWASP ASI04 — Cascading Hallucination / persistence cascade (Stage 3 → Stage 5)
OWASP ASI05 — Sensitive Data Disclosure (Stage 4)
OWASP ASI09 — Insufficient Logging (failure mode 4)
OWASP ASI10 — Rogue Agents (Stage 5)
MITRE ATLAS — Initial Access via Indirect Prompt Injection; Persistence via Agent-Generated Files; Exfiltration via Tool Misuse

Defensive Lessons

Treat every retrieval as untrusted. GitHub issues, PR bodies, READMEs, commit messages, code comments — all attacker-influenceable inputs to a coding agent. See RAG Hardening and Indirect Prompt Injection.
Egress filtering is non-optional for agents with both private-data and untrusted-content exposure. Domain allowlist at the network layer breaks the exfiltration step regardless of model output.
High-risk actions need tier-based gating. Writing to project files and making outbound network calls are not low-risk for a coding agent.
Sandboxing and ephemeral environments limit the persistence surface; if the agent’s working directory is rebuilt per session, the Stage 3 persistence does not survive.
Behavioral baseline + drift detection. A coding agent suddenly making outbound POSTs to a never-before-seen domain is a textbook anomaly.

Sources

See frontmatter sources:. Primary public summary: Securing Your Agents slide 14, citing Johann Rehberger’s August 2025 disclosures.
Original disclosure attributed to Johann Rehberger (Embrace The Red).

Enterprise Security in the Agentic AI Era

Explorer

Jules AI Kill Chain — Indirect Injection to Full Remote Control

Jules AI Kill Chain — Indirect Injection to Full Remote Control

Summary

The Five-Stage Chain

Failure Modes Identified

Why This Matters

Mapping to Frameworks

Defensive Lessons

Sources

See Also

Graph View

Table of Contents

Backlinks