Orchestration Hijacking
A class of attack against agentic systems in which the orchestration layer — the LLM (or LLM-driven planner) responsible for sequencing tool calls — is manipulated by adversarial content to plan attacker-favorable actions. Named in this form by Nicolas Lidzborski (Google Workspace) at [[unprompted-conference-march-2026|[un]prompted March 2026]].
The orchestration layer is the natural target for a sophisticated indirect-injection attack because manipulating what the agent decides to do is more powerful than manipulating what a single tool call returns. A successful orchestration hijack converts the LLM from a useful planner into a tool-call generator under attacker control — without compromising any individual tool.
How it differs from primary-prompt injection
A typical indirect prompt injection targets the model’s response to the current prompt — for example, causing the model to leak data in its output, or to call a tool with attacker-chosen parameters in the same turn.
Orchestration hijacking is broader and often delayed:
- The injection enters the agent’s context (memory, retrieved document, tool output) but does not act immediately
- It plants instructions that influence future planning decisions
- The actual malicious action may occur many turns later, after the user has long forgotten the entry point
- Triggers can be time-based (act on the first call after midnight UTC), event-based (act when a specific user query is observed), or cascade-based (act when another agent invokes this one)
This is why Lidzborski warns: “You can have that injection inserted in the database, and a layer later, it’s being executed.” The temporal decoupling makes attribution and incident response materially harder.
Sub-patterns
Planner manipulation
The injected content alters the agent’s choice of tool, the sequence of calls, or the parameters passed. An agent that “should” call search → summarize is nudged into search → exfiltrate → summarize. The high-level task still appears to complete, masking the inserted step.
Inter-agent communication hijacking
In multi-agent systems, a compromised agent can send adversarial messages to peer agents. The receiving agent — now planning based on adversarially shaped peer output — may take actions its principal would not have authorized. (Cross-reference: Multi-Agent Runtime Security cascade detection.)
Dormant trigger insertion
The attacker plants instructions that lie inert until a specific condition is met. The classic example: a malicious row inserted into a vector store that is retrieved (and therefore interpreted) only when a specific class of query is asked. The agent at retrieval time has no signal that this content is older or more suspicious than other context.
Tool-call parameter coercion
The injection coerces the planner into calling an authorized tool with attacker-controlled parameters. The tool itself is uncompromised; the use of the tool is. This is the structural pattern behind many of the MCP CVE class exploits in Q1 2026 — the MCP server isn’t necessarily vulnerable, but the agent calling it has been subverted into calling it with adversarial inputs.
Relation to MCP and the Lethal Trifecta
MCP makes orchestration hijacking more dangerous:
- The MCP context window is much larger than a single LLM prompt — more places to plant a dormant trigger
- MCP servers are often discovered dynamically, increasing the attack surface
- Many MCP server implementations process tool descriptions as free text, creating a tool-poisoning surface that compounds with orchestration hijacking
In the Lethal Trifecta framing, orchestration hijacking is the specific mechanism by which the trifecta becomes catastrophic. Without orchestration hijacking, sensitive private data + untrusted content + external comms is capable of being exfiltrated; with orchestration hijacking, the agent is systematically directed to do so.
Defenses
The orchestration layer is not defendable from the inside — it shares the prompt-as-code vulnerability of any LLM. Defenses are structural:
| Layer | Defense |
|---|---|
| Input | Strip hidden content; scan for known-injection patterns; tag content with provenance (memory poisoning defense) |
| Memory | Provenance attestation on retrieved content; integrity monitoring for scratchpad / state |
| Planner | Deterministic policy enforcement at every step (not just on initial prompt); state-aware FSM tracking risk level of accumulated context |
| Tool call | Capability tokens (Tenuo Warrants) constrain what the agent can request, regardless of what the planner decided |
| Egress | AgentGateway / tool-policy enforcement at the broker level |
| Observability | Behavioral drift detection on planner decisions; per-agent baseline of tool-call patterns |
The deepest structural mitigation is channel separation (CaMeL) — the privileged LLM that does planning never sees untrusted content directly, eliminating the injection path into the planner.
Cross-references
- Indirect prompt injection — the input vector
- Memory poisoning — the durability mechanism that enables delayed/dormant triggers
- Tool poisoning — the supply-chain twin (compromise the tools rather than the planner that calls them)
- Agency gap — the non-adversarial counterpart (the planner picks a wrong action without external manipulation)
- Multi-Agent Runtime Security — cascade detection across hijacked planners
- Plan-Validate-Execute — the structural pattern that interposes validation between planning and execution