Big Sleep (Google Project Zero + DeepMind)
Sources: Project Zero — From Naptime to Big Sleep (Oct 2024) · Google Cloud Blog — Big Sleep agent makes a big leap
Big Sleep is Google’s AI agent for vulnerability discovery — a collaboration between Google Project Zero and Google DeepMind. It grew out of the earlier Project Naptime framework that achieved state-of-the-art on Meta’s CyberSecEval2 benchmarks. Big Sleep’s signature methodology is variant analysis: given a previously-fixed vulnerability (commit message + diff), find similar patterns elsewhere in the codebase. Project Zero positions this narrower task as a better fit for current LLMs than open-ended vulnerability discovery.
Disclosed Capability Milestones
| Date | Milestone | Source |
|---|---|---|
| June 2024 | Project Naptime announced — LLM-assisted vuln research framework; state-of-the-art on Meta CyberSecEval2 | Project Zero (predecessor post) |
| October 2024 | Naptime → Big Sleep rebrand; first real-world vulnerability disclosed (SQLite stack buffer underflow); reported and patched same day, before any release | Project Zero, Oct 2024 |
| July 2025 | SQLite CVE-2025-6965 disclosed — vulnerability “known only to threat actors and at risk of being exploited”; first time AI agent “directly foiled efforts to exploit a vulnerability in the wild” | Google Cloud Blog |
| August 2025 | Public report: ~20 security vulnerabilities found | TechCrunch coverage |
| May 2026 | Named in Anthropic Glasswing announcement as Google’s parallel AI-cyber tool alongside CodeMender | Heather Adkins (VP Security Engineering) quote |
Methodology
Big Sleep’s core operating mode is variant analysis:
- Input: a previously-fixed vulnerability (commit message + diff).
- Search: scan the current repository (at HEAD) for related patterns that may not have been fixed.
- Output: candidate findings with reasoning trace.
This framing is chosen for three reasons (per Project Zero’s October 2024 post):
- Real-world exploit-variant pattern: “over 40% of the 0-days discovered were variants of previously reported vulnerabilities”. Fuzzing fails to catch variants; attackers use manual variant analysis cost-effectively.
- LLM task fit: variant analysis “remove[s] a lot of ambiguity from vulnerability research, and start[s] from a concrete, well-founded theory: ‘This was a previous bug; there is probably another similar one somewhere.’”
- Asymmetric defender advantage: pre-release discovery means “no scope for attackers to compete: the vulnerabilities are fixed before attackers even have a chance to use them.”
Architectural Position
Big Sleep is paired with CodeMender (Google DeepMind, Oct 2025) as Google’s two-pronged vuln-discovery + patching stack:
| Capability | Agent |
|---|---|
| Discovery / variant analysis | Big Sleep |
| Patching / proactive rewrite | CodeMender |
The integration architecture between the two is not documented in public sources.
Position in the Wiki
Big Sleep is Google’s defender-side analogue to:
- Anthropic Claude Mythos Preview (vendor frontier model used for discovery)
- Microsoft MDASH (orchestrated multi-model harness)
- XBOW (offensive-orientation harness)
The four sit on the ai-vuln-discovery axis, all converging on the architectural argument that orchestration outperforms raw model capability. Big Sleep’s distinguishing feature is the variant-analysis specialization — a narrower task framing that makes the discovery problem more tractable than open-ended search.
CMM / RA Maps-to
- CMM D7 (Observability & Detection) L5+ — Big Sleep is a defender-side discovery primitive; CVE-2025-6965 is the canonical example of AI-foiled in-the-wild exploitation.
- CMM D3 (Supply Chain) — pre-release OSS vulnerability discovery (SQLite as primary example) hardens upstream supply-chain.
Open Questions
- Model attribution: Project Zero does not name the LLM. Gemini-family is presumed (DeepMind co-developed) but unconfirmed.
- Productization timeline: Big Sleep remains research-stage with select customer access via Google Cloud Security. Public GA timeline / pricing / customer base not disclosed.
- Operational integration: how Big Sleep findings feed CodeMender patching is not documented.
- Glasswing role: Google is a Project Glasswing partner with Mythos access via Vertex AI. Whether Big Sleep itself uses Mythos, Gemini, or both is unclear.
See Also
- Big Sleep foundational paper (Oct 2024) — source summary.
- CodeMender — patching-side counterpart.
- Google — vendor.
- Glasswing announcement — May 2026 coalition naming Big Sleep.
- Frontier AI for Vulnerability Discovery — wiki thesis.
- MDASH / XBOW / Claude Mythos Preview — adjacent ecosystem.
- CyberGym — public leaderboard (Big Sleep is not listed; CyberSecEval2 was Naptime’s benchmark).