Apollo Research

UK-based AI safety evaluation organization focused on deception detection, scheming evaluations, and multi-agent collusion. The load-bearing source for the wiki’s coverage of agent-agent collusion as a threat class — see Agentic AI Threat Classes — 2026 Expansion §Class 3.

Notable contributions

  • Secret Collusion among AI Agents: Multi-Agent Deception via Steganography (NeurIPS 2024, arXiv:2402.07510) — formal threat model for steganographic agent-to-agent collusion that bypasses output-monitoring oversight.
  • Detecting Strategic Deception Using Linear Probes (2025, apolloresearch.ai/research/deception-probes) — detection technique for scheming/deception in single agents via residual-stream probes.
  • Detecting and Reducing AI Scheming (2025, with OpenAI) — pre-deployment evals for scheming.

Relationship to other safety bodies

Independent of (but cooperative with) UK AISI, Anthropic, and OpenAI. Apollo’s work is frequently cited in frontier-vendor pre-deployment system cards and in AISI joint evaluations.

See Also