Orchestration Hijacking

A class of attack against agentic systems in which the orchestration layer — the LLM (or LLM-driven planner) responsible for sequencing tool calls — is manipulated by adversarial content to plan attacker-favorable actions. Named in this form by Nicolas Lidzborski (Google Workspace) at [[unprompted-conference-march-2026|[un]prompted March 2026]].

The orchestration layer is the natural target for a sophisticated indirect-injection attack because manipulating what the agent decides to do is more powerful than manipulating what a single tool call returns. A successful orchestration hijack converts the LLM from a useful planner into a tool-call generator under attacker control — without compromising any individual tool.

How it differs from primary-prompt injection

A typical indirect prompt injection targets the model’s response to the current prompt — for example, causing the model to leak data in its output, or to call a tool with attacker-chosen parameters in the same turn.

Orchestration hijacking is broader and often delayed:

  • The injection enters the agent’s context (memory, retrieved document, tool output) but does not act immediately
  • It plants instructions that influence future planning decisions
  • The actual malicious action may occur many turns later, after the user has long forgotten the entry point
  • Triggers can be time-based (act on the first call after midnight UTC), event-based (act when a specific user query is observed), or cascade-based (act when another agent invokes this one)

This is why Lidzborski warns: “You can have that injection inserted in the database, and a layer later, it’s being executed.” The temporal decoupling makes attribution and incident response materially harder.

Sub-patterns

Planner manipulation

The injected content alters the agent’s choice of tool, the sequence of calls, or the parameters passed. An agent that “should” call search → summarize is nudged into search → exfiltrate → summarize. The high-level task still appears to complete, masking the inserted step.

Inter-agent communication hijacking

In multi-agent systems, a compromised agent can send adversarial messages to peer agents. The receiving agent — now planning based on adversarially shaped peer output — may take actions its principal would not have authorized. (Cross-reference: Multi-Agent Runtime Security cascade detection.)

Dormant trigger insertion

The attacker plants instructions that lie inert until a specific condition is met. The classic example: a malicious row inserted into a vector store that is retrieved (and therefore interpreted) only when a specific class of query is asked. The agent at retrieval time has no signal that this content is older or more suspicious than other context.

Tool-call parameter coercion

The injection coerces the planner into calling an authorized tool with attacker-controlled parameters. The tool itself is uncompromised; the use of the tool is. This is the structural pattern behind many of the MCP CVE class exploits in Q1 2026 — the MCP server isn’t necessarily vulnerable, but the agent calling it has been subverted into calling it with adversarial inputs.

Relation to MCP and the Lethal Trifecta

MCP makes orchestration hijacking more dangerous:

  • The MCP context window is much larger than a single LLM prompt — more places to plant a dormant trigger
  • MCP servers are often discovered dynamically, increasing the attack surface
  • Many MCP server implementations process tool descriptions as free text, creating a tool-poisoning surface that compounds with orchestration hijacking

In the Lethal Trifecta framing, orchestration hijacking is the specific mechanism by which the trifecta becomes catastrophic. Without orchestration hijacking, sensitive private data + untrusted content + external comms is capable of being exfiltrated; with orchestration hijacking, the agent is systematically directed to do so.

Defenses

The orchestration layer is not defendable from the inside — it shares the prompt-as-code vulnerability of any LLM. Defenses are structural:

LayerDefense
InputStrip hidden content; scan for known-injection patterns; tag content with provenance (memory poisoning defense)
MemoryProvenance attestation on retrieved content; integrity monitoring for scratchpad / state
PlannerDeterministic policy enforcement at every step (not just on initial prompt); state-aware FSM tracking risk level of accumulated context
Tool callCapability tokens (Tenuo Warrants) constrain what the agent can request, regardless of what the planner decided
EgressAgentGateway / tool-policy enforcement at the broker level
ObservabilityBehavioral drift detection on planner decisions; per-agent baseline of tool-call patterns

The deepest structural mitigation is channel separation (CaMeL) — the privileged LLM that does planning never sees untrusted content directly, eliminating the injection path into the planner.

Cross-references