Orchestration Hijacking

A class of attack against agentic systems in which the orchestration layer — the LLM (or LLM-driven planner) responsible for sequencing tool calls — is manipulated by adversarial content to plan attacker-favorable actions. Named in this form by Nicolas Lidzborski (Google Workspace) at [[unprompted-conference-march-2026|[un]prompted March 2026]].

The orchestration layer is the natural target for a sophisticated indirect-injection attack because manipulating what the agent decides to do is more powerful than manipulating what a single tool call returns. A successful orchestration hijack converts the LLM from a useful planner into a tool-call generator under attacker control — without compromising any individual tool.

How it differs from primary-prompt injection

A typical indirect prompt injection targets the model’s response to the current prompt — for example, causing the model to leak data in its output, or to call a tool with attacker-chosen parameters in the same turn.

Orchestration hijacking is broader and often delayed:

The injection enters the agent’s context (memory, retrieved document, tool output) but does not act immediately
It plants instructions that influence future planning decisions
The actual malicious action may occur many turns later, after the user has long forgotten the entry point
Triggers can be time-based (act on the first call after midnight UTC), event-based (act when a specific user query is observed), or cascade-based (act when another agent invokes this one)

This is why Lidzborski warns: “You can have that injection inserted in the database, and a layer later, it’s being executed.” The temporal decoupling makes attribution and incident response materially harder.

Sub-patterns

Planner manipulation

The injected content alters the agent’s choice of tool, the sequence of calls, or the parameters passed. An agent that “should” call search → summarize is nudged into search → exfiltrate → summarize. The high-level task still appears to complete, masking the inserted step.

Inter-agent communication hijacking

In multi-agent systems, a compromised agent can send adversarial messages to peer agents. The receiving agent — now planning based on adversarially shaped peer output — may take actions its principal would not have authorized. (Cross-reference: Multi-Agent Runtime Security cascade detection.)

Dormant trigger insertion

The attacker plants instructions that lie inert until a specific condition is met. The classic example: a malicious row inserted into a vector store that is retrieved (and therefore interpreted) only when a specific class of query is asked. The agent at retrieval time has no signal that this content is older or more suspicious than other context.

Tool-call parameter coercion

The injection coerces the planner into calling an authorized tool with attacker-controlled parameters. The tool itself is uncompromised; the use of the tool is. This is the structural pattern behind many of the MCP CVE class exploits in Q1 2026 — the MCP server isn’t necessarily vulnerable, but the agent calling it has been subverted into calling it with adversarial inputs.

Relation to MCP and the Lethal Trifecta

MCP makes orchestration hijacking more dangerous:

The MCP context window is much larger than a single LLM prompt — more places to plant a dormant trigger
MCP servers are often discovered dynamically, increasing the attack surface
Many MCP server implementations process tool descriptions as free text, creating a tool-poisoning surface that compounds with orchestration hijacking

In the Lethal Trifecta framing, orchestration hijacking is the specific mechanism by which the trifecta becomes catastrophic. Without orchestration hijacking, sensitive private data + untrusted content + external comms is capable of being exfiltrated; with orchestration hijacking, the agent is systematically directed to do so.

Defenses

The orchestration layer is not defendable from the inside — it shares the prompt-as-code vulnerability of any LLM. Defenses are structural:

Layer	Defense
Input	Strip hidden content; scan for known-injection patterns; tag content with provenance (memory poisoning defense)
Memory	Provenance attestation on retrieved content; integrity monitoring for scratchpad / state
Planner	Deterministic policy enforcement at every step (not just on initial prompt); state-aware FSM tracking risk level of accumulated context
Tool call	Capability tokens (Tenuo Warrants) constrain what the agent can request, regardless of what the planner decided
Egress	AgentGateway / tool-policy enforcement at the broker level
Observability	Behavioral drift detection on planner decisions; per-agent baseline of tool-call patterns

The deepest structural mitigation is channel separation (CaMeL) — the privileged LLM that does planning never sees untrusted content directly, eliminating the injection path into the planner.

Cross-references

Indirect prompt injection — the input vector
Memory poisoning — the durability mechanism that enables delayed/dormant triggers
Tool poisoning — the supply-chain twin (compromise the tools rather than the planner that calls them)
Agency gap — the non-adversarial counterpart (the planner picks a wrong action without external manipulation)
Multi-Agent Runtime Security — cascade detection across hijacked planners
Plan-Validate-Execute — the structural pattern that interposes validation between planning and execution

Enterprise Security in the Agentic AI Era

Explorer

Orchestration Hijacking

Orchestration Hijacking

How it differs from primary-prompt injection

Sub-patterns

Planner manipulation

Inter-agent communication hijacking

Dormant trigger insertion

Tool-call parameter coercion

Relation to MCP and the Lethal Trifecta

Defenses

Cross-references

Graph View

Table of Contents

Backlinks