Jules AI Kill Chain — Indirect Injection to Full Remote Control
Summary
In August 2025, Johann Rehberger demonstrated a full kill chain against Google’s Jules coding agent as part of his “Month of AI Bugs” disclosure series. The attack started with an indirect prompt injection hidden in a GitHub issue body, escalated through write access to project files (persistence), exfiltrated source code and credentials, and ended with the agent polling an attacker-controlled endpoint for remote commands — full command-and-control of an agent running on Google infrastructure.
The five-stage chain is one of the cleanest publicly documented end-to-end agent compromises of 2025 and is cited as a canonical case study in the “Securing Your Agents” deck (slide 14).
The Five-Stage Chain
| # | Stage | Actor | What happened |
|---|---|---|---|
| 1 | PLANT | Attacker | Seeds a GitHub issue with hidden prompt injection. Issue body looks like a normal bug report; injection sits in invisible characters / HTML comment / hidden formatting. |
| 2 | HIJACK | Agent (compromised) | Jules reads the issue. Injection overrides the agent’s goal. The user’s “investigate this bug” task is silently abandoned in favor of attacker instructions. |
| 3 | PERSIST | Agent (under attacker control) | Agent writes attacker-supplied payloads into project files and configuration. Survives session restarts — the attack is now self-sustaining within the repo. |
| 4 | EXFILTRATE | Agent (under attacker control) | Agent uses its unrestricted network access to POST source code and discoverable credentials to an external endpoint. |
| 5 | CONTROL | Attacker | Full C2: agent polls a remote endpoint for new commands and executes them on Google infrastructure on the attacker’s behalf. |
Failure Modes Identified
The compromise required four controls to be missing simultaneously. Removing any one would have broken the chain:
| Missing control | What it would have prevented |
|---|---|
| Input sanitization on retrieved content | The injection in the issue body would have been scanned and quarantined before reaching the model context |
| Egress filtering | The agent could POST data to any endpoint; an outbound-domain allowlist would have blocked the exfiltration step |
| Human-in-the-loop on file writes / network calls | High-risk actions (writing to project files, outbound HTTP) executed without confirmation; see Least Agency Principle |
| Anomaly detection | The behavioral shift from “investigating a bug” to “exfiltrating source code” went unobserved |
This is the operational instantiation of the Lethal Trifecta — Jules had access to private data (the repository), exposure to untrusted content (the GitHub issue), and external communication (network access). All three legs present, no controls breaking the trifecta.
Why This Matters
- Coding agents are an exceptionally exposed class. They have read/write access to source code, persistent file system access, network access, and they routinely ingest text from issues, PR bodies, commit messages, and READMEs — all attacker-influenceable.
- Persistence in agent compromise is novel. Unlike traditional RCE that needs a foothold mechanism, the agent itself wrote the persistence into the repo on the attacker’s instruction. The next session of Jules (or any other agent that reads those files) inherits the compromise.
- The user surface looked normal throughout. No alerts, no warnings. The user asked for a bug investigation; the response they eventually saw appeared on-task. The compromise was invisible without log inspection.
- A frontier-vendor agent failed. This was not a small startup’s prototype; it was Google’s coding agent on Google infrastructure. Frontier engineering does not equal frontier security.
Mapping to Frameworks
- OWASP ASI01 — Agent Goal Hijack (Stage 2)
- OWASP ASI02 — Tool Misuse & Exploitation (Stage 3, Stage 4)
- OWASP ASI04 — Cascading Hallucination / persistence cascade (Stage 3 → Stage 5)
- OWASP ASI05 — Sensitive Data Disclosure (Stage 4)
- OWASP ASI09 — Insufficient Logging (failure mode 4)
- OWASP ASI10 — Rogue Agents (Stage 5)
- MITRE ATLAS — Initial Access via Indirect Prompt Injection; Persistence via Agent-Generated Files; Exfiltration via Tool Misuse
Defensive Lessons
- Treat every retrieval as untrusted. GitHub issues, PR bodies, READMEs, commit messages, code comments — all attacker-influenceable inputs to a coding agent. See RAG Hardening and Indirect Prompt Injection.
- Egress filtering is non-optional for agents with both private-data and untrusted-content exposure. Domain allowlist at the network layer breaks the exfiltration step regardless of model output.
- High-risk actions need tier-based gating. Writing to project files and making outbound network calls are not low-risk for a coding agent.
- Sandboxing and ephemeral environments limit the persistence surface; if the agent’s working directory is rebuilt per session, the Stage 3 persistence does not survive.
- Behavioral baseline + drift detection. A coding agent suddenly making outbound POSTs to a never-before-seen domain is a textbook anomaly.
Sources
- See frontmatter
sources:. Primary public summary: Securing Your Agents slide 14, citing Johann Rehberger’s August 2025 disclosures. - Original disclosure attributed to Johann Rehberger (Embrace The Red).
See Also
- Month of AI Bugs (August 2025) — Coordinated Public Disclosures — the broader August 2025 disclosure series this attack belongs to
- Indirect Prompt Injection — the attack class
- Lethal Trifecta — the structural condition that enabled the chain
- Tool-Abuse Chains — the cascade pattern in stages 3–5
- Prompt Injection Containment for Agentic Systems — the runtime controls that would have broken the chain