Apollo Research
UK-based AI safety evaluation organization focused on deception detection, scheming evaluations, and multi-agent collusion. The load-bearing source for the wiki’s coverage of agent-agent collusion as a threat class — see Agentic AI Threat Classes — 2026 Expansion §Class 3.
Notable contributions
- Secret Collusion among AI Agents: Multi-Agent Deception via Steganography (NeurIPS 2024, arXiv:2402.07510) — formal threat model for steganographic agent-to-agent collusion that bypasses output-monitoring oversight.
- Detecting Strategic Deception Using Linear Probes (2025, apolloresearch.ai/research/deception-probes) — detection technique for scheming/deception in single agents via residual-stream probes.
- Detecting and Reducing AI Scheming (2025, with OpenAI) — pre-deployment evals for scheming.
Relationship to other safety bodies
Independent of (but cooperative with) UK AISI, Anthropic, and OpenAI. Apollo’s work is frequently cited in frontier-vendor pre-deployment system cards and in AISI joint evaluations.
See Also
- Agentic AI Threat Classes — 2026 Expansion — primary citation
- UK AI Safety Institute — peer organization