Apollo Research

UK-based AI safety evaluation organization focused on deception detection, scheming evaluations, and multi-agent collusion. The load-bearing source for the wiki’s coverage of agent-agent collusion as a threat class — see Agentic AI Threat Classes — 2026 Expansion §Class 3.

Notable contributions

Secret Collusion among AI Agents: Multi-Agent Deception via Steganography (NeurIPS 2024, arXiv:2402.07510) — formal threat model for steganographic agent-to-agent collusion that bypasses output-monitoring oversight.
Detecting Strategic Deception Using Linear Probes (2025, apolloresearch.ai/research/deception-probes) — detection technique for scheming/deception in single agents via residual-stream probes.
Detecting and Reducing AI Scheming (2025, with OpenAI) — pre-deployment evals for scheming.

Relationship to other safety bodies

Independent of (but cooperative with) UK AISI, Anthropic, and OpenAI. Apollo’s work is frequently cited in frontier-vendor pre-deployment system cards and in AISI joint evaluations.

Enterprise Security in the Agentic AI Era

Explorer

Apollo Research

Apollo Research

Notable contributions

Relationship to other safety bodies

See Also

Graph View

Table of Contents

Backlinks