Promptware

Promptware is the class of adversarial prompt payloads that go beyond simple injection to function as structured, multi-stage malware — complete with persistence, command-and-control communication, data exfiltration loops, and lateral movement — all implemented entirely in natural language rather than machine code.

The term was introduced to the practitioner security community by Johann Rehberger at [un]prompted (March 2026) and framed with reference to Ben Nassi’s concurrent Promptware Kill Chain paper. The core insight is that injection is a technique, not the attack itself. The injection is the initial access vector; what follows is a structured program — written in the grammar of the LLM — that achieves a complex attacker objective.

The Naming Shift: From Injection to Malware

“The injection is the technique. And then what you do later is a complex set of instructions to achieve an objective — that is called promptware, that aligns along the kill chain: data exfiltration, persistence, and so on.” — Johann Rehberger, [un]prompted 2026.

Distinction from Prompt Injection

Prompt InjectionPromptware
A single inserted instructionA structured sequence of instructions
Atomic — one trigger, one effectStateful — conditions, tool-call chains, self-propagation
Goal: override a single model decisionGoal: achieve multi-stage attacker objective
Analogous to: a command injectionAnalogous to: a full malware implant
Detection: single-turn anomalyDetection: behavioral baseline drift, multi-turn analysis

The distinction matters for defense: blocking a single injected sentence is insufficient if the “malware” component carries its own persistence, retry logic, and C2 registration instructions.

Promptware Structure (Kill Chain Mapping)

Rehberger illustrates promptware against the classical kill chain:

Kill Chain PhasePromptware Mechanism
Initial AccessIndirect prompt injection via file, email, MCP tool, issue ticket
ExecutionLLM follows embedded multi-step instructions as natural language
PersistenceMemory poisoning — instructions written to long-term memory (Copilot, Gemini); behavioral modification survives conversation resets
Command & ControlAgent Commander pattern — heartbeat loop; agent polls attacker’s prompt C2 server; receives and executes further prompt commands
ExfiltrationContinuous data streaming — every user turn silently copied to attacker; screenshot capture via browser tools; email forwarding via Workspace integrations
Lateral MovementAgent re-used as a platform for further prompt injection of other agents; BYOD-style MCP server compromise

Published Promptware Examples

Rehberger disclosed the following promptware instances at [un]prompted March 2026:

1. Spyware (ChatGPT — “Spyber”)

First demonstrated at Black Hat (two years prior to the talk). After a successful injection and memory poisoning in ChatGPT, the agent is instructed to continue exfiltrating every user message to an attacker server on every subsequent conversation turn. The user types; the data leaves.

2. Microsoft Enterprise Copilot Memory Implant

An indirect injection via a malicious file summary triggers the Copilot memory tool to write two persistent memories:

  • A fabricated user attribute (“I am 102 years old”) — demonstrates data falsification.
  • A standing behavioral instruction (“run Commodore 64 simulator”) — demonstrates persistent behavioral modification.

All future Copilot conversations for that user are now operating against falsified memory. Fixed by Microsoft in December (year preceding the talk).

3. OpenClaw / KimiCloud Agent Commander Enrollment

The most complex promptware variant. An indirect injection (zero-click email trigger via OpenClaw’s PubSub Gmail subscription) enrolls the agent in a prompt-level command-and-control server. The heartbeat payload:

  • Is written entirely in natural language
  • Contains standing instructions to join the C2 server and await further commands
  • Hides itself from the user interface using known UI suppression strings (heartbeat OK suffix; NO_REPLY prefix)
  • Works cross-platform (tested on both OpenClaw and KimiCloud/Alibaba)

This is the operational endpoint of the promptware concept: the agent has been converted into a persistent, remotely-commanded bot that exfiltrates screenshots, runs security assessments, and accepts arbitrary prompt-template dispatches — all without the user ever seeing a suspicious message.

Promptware Kill Chain (Ben Nassi)

Rehberger credits Ben Nassi’s Promptware Kill Chain paper as the formal academic framing. The paper makes the same observation in academic terms: prompt injection should be analyzed not as an atomic event but as the initial phase of a multi-stage attack chain with its own persistence and exfiltration phases — following the structure of traditional malware kill chains.

Citation Gap

The Promptware Kill Chain paper was described by Rehberger as having been “released very recently” at the time of the March 2026 talk. The paper is not yet in this wiki’s citation set. The Ben Nassi entity page (Ben Nassi) should be updated when the paper is located.

Defensive Implications

Treating promptware as malware (rather than as a prompt engineering problem) changes what defenses are appropriate:

If you treat it as…Your defenses look like…What you miss
Prompt injectionInput filters, system-prompt hardening, detection classifiersMulti-turn persistence, C2 registration, exfil loops
MalwareBehavioral baselines, memory integrity monitoring, egress monitoring, C2 domain blocks, kill switchesModel-layer mitigations

The right posture is both. But the malware framing adds defenses that the pure injection framing omits:

  • Memory integrity monitoring — detect unauthorized memory writes (as seen in the Copilot implant)
  • Behavioral drift detection — flag when an agent that never called the memory tool starts calling it
  • Egress C2 pattern detection — identify heartbeat-style polling to novel domains
  • circuit breaker — halt a running agent when anomalous behavior is detected mid-task

See Also