Big Sleep (Google Project Zero + DeepMind)

Sources: Project Zero — From Naptime to Big Sleep (Oct 2024) · Google Cloud Blog — Big Sleep agent makes a big leap

Big Sleep is Google’s AI agent for vulnerability discovery — a collaboration between Google Project Zero and Google DeepMind. It grew out of the earlier Project Naptime framework that achieved state-of-the-art on Meta’s CyberSecEval2 benchmarks. Big Sleep’s signature methodology is variant analysis: given a previously-fixed vulnerability (commit message + diff), find similar patterns elsewhere in the codebase. Project Zero positions this narrower task as a better fit for current LLMs than open-ended vulnerability discovery.

Disclosed Capability Milestones

DateMilestoneSource
June 2024Project Naptime announced — LLM-assisted vuln research framework; state-of-the-art on Meta CyberSecEval2Project Zero (predecessor post)
October 2024Naptime → Big Sleep rebrand; first real-world vulnerability disclosed (SQLite stack buffer underflow); reported and patched same day, before any releaseProject Zero, Oct 2024
July 2025SQLite CVE-2025-6965 disclosed — vulnerability “known only to threat actors and at risk of being exploited”; first time AI agent “directly foiled efforts to exploit a vulnerability in the wild”Google Cloud Blog
August 2025Public report: ~20 security vulnerabilities foundTechCrunch coverage
May 2026Named in Anthropic Glasswing announcement as Google’s parallel AI-cyber tool alongside CodeMenderHeather Adkins (VP Security Engineering) quote

Methodology

Big Sleep’s core operating mode is variant analysis:

  1. Input: a previously-fixed vulnerability (commit message + diff).
  2. Search: scan the current repository (at HEAD) for related patterns that may not have been fixed.
  3. Output: candidate findings with reasoning trace.

This framing is chosen for three reasons (per Project Zero’s October 2024 post):

  • Real-world exploit-variant pattern: “over 40% of the 0-days discovered were variants of previously reported vulnerabilities”. Fuzzing fails to catch variants; attackers use manual variant analysis cost-effectively.
  • LLM task fit: variant analysis “remove[s] a lot of ambiguity from vulnerability research, and start[s] from a concrete, well-founded theory: ‘This was a previous bug; there is probably another similar one somewhere.’”
  • Asymmetric defender advantage: pre-release discovery means “no scope for attackers to compete: the vulnerabilities are fixed before attackers even have a chance to use them.”

Architectural Position

Big Sleep is paired with CodeMender (Google DeepMind, Oct 2025) as Google’s two-pronged vuln-discovery + patching stack:

CapabilityAgent
Discovery / variant analysisBig Sleep
Patching / proactive rewriteCodeMender

The integration architecture between the two is not documented in public sources.

Position in the Wiki

Big Sleep is Google’s defender-side analogue to:

The four sit on the ai-vuln-discovery axis, all converging on the architectural argument that orchestration outperforms raw model capability. Big Sleep’s distinguishing feature is the variant-analysis specialization — a narrower task framing that makes the discovery problem more tractable than open-ended search.

CMM / RA Maps-to

  • CMM D7 (Observability & Detection) L5+ — Big Sleep is a defender-side discovery primitive; CVE-2025-6965 is the canonical example of AI-foiled in-the-wild exploitation.
  • CMM D3 (Supply Chain) — pre-release OSS vulnerability discovery (SQLite as primary example) hardens upstream supply-chain.

Open Questions

  • Model attribution: Project Zero does not name the LLM. Gemini-family is presumed (DeepMind co-developed) but unconfirmed.
  • Productization timeline: Big Sleep remains research-stage with select customer access via Google Cloud Security. Public GA timeline / pricing / customer base not disclosed.
  • Operational integration: how Big Sleep findings feed CodeMender patching is not documented.
  • Glasswing role: Google is a Project Glasswing partner with Mythos access via Vertex AI. Whether Big Sleep itself uses Mythos, Gemini, or both is unclear.

See Also