Frontier AI for Vulnerability Discovery

Question

How are frontier AI models being used in 2026 to discover vulnerabilities in production code, and what is the gap between demonstrated capability (research demos, isolated audits) and operational practice (continuous adoption in enterprise AppSec, dedicated tooling, vendor consolidation)? Specifically: where do Claude, GPT-class, and Mythos-class models sit on the spectrum from “supervised reverse-engineering assistant” to “autonomous zero-day finder”? What are the procurement, IP, and disclosure constraints that shape adoption?

Current Position

As of May 13 2026, three sourced anchors landed within 36 hours (May 12-13) — XBOW’s offensive Mythos evaluation, Microsoft’s defensive MDASH announcement, and Anthropic’s Project Glasswing announcement. All three converge: the model is one input, the harness around it is the durable engineering, and the candidate-vs-validation asymmetry is the load-bearing observation. The convergence is not coincidence; it is coordinated launch.

Three sourced anchors, one convergent argument

Anthropic’s Project Glasswing announcement (May 12, 2026) is the organizing anchor. Anthropic announced a 12-partner coalition (AWS, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks) plus 40+ additional organizations, with up to 4M in OSS-security donations, applying Claude Mythos Preview to defensive vulnerability discovery on the world’s most-critical software. Anthropic’s framing: “AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.” Mythos is not planned for general availability; preview-only, 125 per million tokens for Glasswing participants. Anthropic commits to a 90-day public report cadence.

XBOW’s evaluation of Mythos (May 12, 2026) — independent (non-Glasswing-partner) offensive evaluation. XBOW reports a 42% reduction in false negatives vs Opus 4.6 on its web exploit benchmark without source-code access, 55% with source access. Live-site validation is “the hard part.” XBOW’s framing: “a model is a brain without a body.”

Microsoft’s MDASH announcement (May 12, 2026) — defensive orientation, Glasswing-partner artifact. MDASH orchestrates 100+ specialized AI agents (auditors, debaters, dedup agents, provers). Public benchmark: 88.45% on CyberGym vs raw Mythos’s 83.1% (per Glasswing’s disclosed numbers — MDASH’s harness adds ~5 percentage points over raw Mythos). Internal: 96% recall on clfs.sys 5-year MSRC, 100% on tcpip.sys, 16 new CVEs in May 2026 Patch Tuesday. Microsoft’s framing: “the harness does the work, and the model is one input.”

Convergent argument, quantified

ObservationEvidenceSource
Frontier models materially advance vuln discoveryMythos 83.1% on CyberGym vs Opus 4.6’s 66.6% (raw); 42-55% FN reduction in XBOW harnessGlasswing / XBOW
Harness over model is the load-bearing surfaceMDASH 88.45% vs raw Mythos 83.1% on CyberGym (+5pp from harness alone)MDASH vs Glasswing reconciled
Validation is the difference between a finding and a fixXBOW’s live-site wedge; MDASH’s automated PoC construction; Anthropic’s 27-year-old OpenBSD exampleAll three
Capability ≈ coalition-distributable12 named Glasswing partners + 40+ extended organizationsGlasswing
Defender-side adoption is industrial-scale, not pilot-scale$100M credit commitment; Microsoft + Google + AWS + financial-sector adoptionGlasswing

Production paths

  1. Coalition-distributed defender-orientation (Glasswing). 12 named partners + 40+ extended organizations apply Mythos to defensive vulnerability discovery on critical infrastructure. Anthropic commits 4M in OSS-security donations. 90-day public report cadence. This is now the dominant production mode on the axis.
  2. Glasswing-partner harness products. Microsoft MDASH is one sourced example (defender-side, multi-model orchestration, 100+ specialized agents). Google operates a two-agent stack: Big Sleep (Project Zero + DeepMind, first AI-discovered real-world exploitable memory-safety bug in SQLite, Oct 2024; July 2025 CVE-2025-6965 disclosure cited as first AI-foiled in-the-wild exploit) for variant-analysis discovery, and CodeMender (DeepMind, Oct 2025, 72 OSS patches upstreamed in 6 months) for reactive + proactive patching. AWS applies Mythos internally; CrowdStrike via Falcon AIDR. The shared architectural pattern across these stacks — multi-agent specialization + LLM-judge validation + automated regression checks — converges with the MDASH design.
  3. Independent (non-partner) offensive-orientation. XBOW orchestrates Mythos against live web targets via a harness adding tooling, browser interaction, and validation logic. XBOW is not a Glasswing partner (per the public partner list), making its evaluation an independent third-party check on Anthropic’s own claims.
  4. Adjacent research-stage approaches. Glass-box security (Carl Hurd / Starseer) and mechanistic interpretability for defense establish inverse capability. Agent Commander flagged autonomous vulnerability discovery as “maybe in a year or so” beyond prompt-C2; the May 2026 evaluations suggest that timeline has compressed faster than predicted.

The CMM L5+ Leading-Edge tier mentions research-stage primitives that overlap this area but does not directly address frontier-AI-for-vuln-discovery as a capability. CMM D7 (Observability & Detection), D3 (Supply Chain), and the L5+ Leading-Edge tier should be annexed with Glasswing/MDASH/XBOW material during the next CMM revision cycle. Glasswing’s named industry-standards-contribution scope (vulnerability disclosure, SDLC, supply-chain security, regulated-industry standards, triage and patching automation) is the candidate intake for CMM evidence-checklist updates.

Supporting Evidence

  • Anthropic’s Project Glasswing announcement (May 12 2026) — coalition-organizing anchor. 12 named partners, 4M OSS donations, 90-day public report cadence, named industry-standards contribution scope, US-government engagement disclosure.
  • Microsoft’s MDASH announcement (May 12 2026) — Glasswing-partner defender-side artifact. 88.45% on CyberGym, 96-100% MSRC retrospective recall, 16-CVE Patch Tuesday cohort, 100+ specialized agents in five-stage pipeline.
  • XBOW’s Mythos Evaluation (May 12 2026) — independent (non-Glasswing-partner) third-party offensive evaluation. 42-55% false-negative reduction vs Opus 4.6.
  • Claude Mythos Preview — Anthropic frontier model. Not planned for GA. 125 per M tokens for Glasswing participants (~1.67× Opus 4.6).
  • Project Glasswing — the coalition.
  • MDASH — Microsoft’s multi-model agentic scanning harness. 100+ agents, 5-stage pipeline.
  • XBOW — autonomous offensive-security platform.
  • CyberGym — public benchmark.

Same-week tri-source convergence

Three sources landed within 36 hours (May 12-13 2026) — Glasswing (Anthropic), MDASH (Microsoft, Glasswing partner), and XBOW (independent). Different orientations (offensive, defensive, coalition-organizing). Different model strategies (ensemble + debate, single best, partner-distributed). All three converge on the same architectural argument: the model is one input, the harness is the durable engineering. The MDASH-vs-raw-Mythos +5pp delta on CyberGym (88.45% vs 83.1%) is the cleanest quantitative measurement of this claim.

Anthropic's strategic position is explicit

Anthropic’s 2026 Agentic Coding Trends Report (early 2026, pre-Glasswing) names Trend 8 as “Agentic coding improves security defenses — but also offensive uses” and Priority 4 as “Embedding security architecture as a part of agentic system design from the earliest stages.” Vendor-strategic-level confirmation that AI-driven security is a core 2026 priority — not a side effect of capability improvement. The collaboration paradox (60% AI usage, 0-20% “fully delegated”) from the same report establishes that HITL is the default operating mode for all agentic coding, including defensive deployments — confirming the candidate-vs-validation asymmetry the wiki’s ai-vuln-discovery axis tracks.

Pricing and GA — corrected via Glasswing

XBOW’s blog said Mythos would be “5× Opus at GA.” Anthropic’s Glasswing page is authoritative: Mythos is not planned for GA; Glasswing-participant pricing is 125 per M tokens (~1.67× Opus 4.6). The wiki treats Glasswing as the canonical source on pricing and distribution. See [!contradiction] callouts on the Mythos and XBOW eval pages.

Anthropic Frontier Red Team blog

Three primary technical sources adjacent to Glasswing are not yet ingested: red.anthropic.com/2026/mythos-preview, red.anthropic.com/2026/firefox/, red.anthropic.com/2026/exploit/. Next ingest candidates.

Claude Mythos Preview system card

anthropic.com/claude-mythos-preview-system-card — canonical technical reference; ingest candidate.

CTI-REALM benchmark

Microsoft’s open-source security benchmark, mentioned by Igor Tsyganskiy in Glasswing quote. Not yet documented on the wiki.

Big Sleep / CodeMender (Google)

Heather Adkins’s quote names these as Google’s parallel AI-powered cybersecurity tools. Not yet documented.

AISI benchmarks for Mythos

Point Estimate’s analysis of AISI’s Mythos vs GPT 5.5 benchmarks referenced in XBOW’s post. Direct AISI source not yet ingested.

CyberGym direct sourcing

Currently cited via MDASH and Glasswing. Direct ingestion of the homepage, full leaderboard, and methodology paper is pending.

Counter-Evidence

The capability claims above should be read alongside the strongest single counter-evidence anchor on the wiki:

  • METR 2025 RCT (July 2025) — randomized controlled trial with 16 experienced open-source maintainers on their own repositories. Enabling early-2025 AI tools made them ~19% slower on real tasks. Forecast was AI-allowed = faster; observed reality was the opposite. The study selects for the worst case for AI benefits (in-domain expert humans), so it bounds rather than refutes the productivity claims — but it is the most rigorous counter-evidence anchor cited across PwC’s 2026 Agentic SDLC report and the Anthropic Trends Report context.

Implication for the wiki: capability gains for vulnerability discovery (the wiki’s ai-vuln-discovery axis) are real but situation-specific, and verification cost is non-trivial. The XBOW / MDASH / Big Sleep numbers are bounded by the methodology each system uses; the METR finding suggests that even when raw capability rises, end-to-end productivity gains are subject to verification overhead.

Counter-Evidence

Lack of reproducible benchmarks

No public benchmark equivalent to AgentDojo for vulnerability discovery capability. Vendor claims dominate; community evaluation lags. This is the largest single hole on this axis.

IP and disclosure constraints

Even when frontier models find real vulnerabilities, the disclosure pipeline (coordinated disclosure timelines, CVE assignment, vendor patch latency) is calibrated for human-paced research. Frontier-AI-driven discovery rates may outpace this pipeline, which would itself be a vulnerability-discovery story.

How This Has Evolved

  • 2026-05-13 — Seeded as part of the wiki scope expansion. Position: provisional; “thinnest of the new scope axes.”
  • 2026-05-13 (afternoon) — Promoted from seed to developing after ingesting XBOW’s Mythos evaluation. Position revised: quantitative third-party evidence now exists; the candidate-vs-validation asymmetry is the new load-bearing observation.
  • 2026-05-13 (evening) — Second sourced anchor added: Microsoft MDASH announcement. Same day as XBOW’s post, opposite orientation, convergent argument. Position revised again: the architectural convergence is now the strongest signal on this axis.
  • 2026-05-13 (late evening) — Third sourced anchor added: Anthropic Project Glasswing announcement. Coalition-organizing source revealing the three artifacts are coordinated launches, not independent observations. Position revised: the axis is now “best-sourced of the new axes.” Pricing/GA contradictions in earlier wiki state (5× Opus at GA, per XBOW’s blog) resolved against Anthropic’s authoritative numbers (125 per M tokens, no GA planned). MDASH-vs-raw-Mythos +5pp delta on CyberGym now identified — clean quantitative anchor for the harness-over-model argument.
  • 2026-05-13 (night) — Fourth and fifth sourced anchors added: Big Sleep paper (Oct 2024) and CodeMender paper (Oct 2025). Closes the “Google parallel stack” gap surfaced by Glasswing. Important historical context: the May 2026 multi-vendor convergence is not the start of agentic vuln-discovery as a productionized field — Big Sleep has been operating publicly since October 2024, CodeMender since October 2025. The May 2026 moment is when (a) Anthropic’s Mythos+Glasswing brought coalition-organizing structure, (b) Microsoft’s MDASH demonstrated benchmark-leading multi-agent orchestration, and (c) XBOW provided independent third-party evaluation, but Google’s Big Sleep was the founding production proof. The lineage is OSS-Fuzz → AI-powered fuzzing (2023) → Project Naptime (June 2024) → Big Sleep (Oct 2024) → CodeMender (Oct 2025) → tri-vendor May 2026 convergence.

Open Sub-Questions

  • What is the right relationship between this axis and MITRE ATLAS? ATLAS is calibrated for “adversarial ML” — attacks against models — not for “models used to find attacks in non-ML systems.” Is the right answer a new taxonomy, an ATLAS extension, or just patient ingestion?
  • Does “Mythos” refer to a specific product I should be tracking, or a class of internal-tooling capability? The user’s framing suggests the latter; ingest priorities depend on the answer.
  • At what point should this thesis page be retired or merged? If the field consolidates around a small vendor set, the right move may be vendor pages plus an aspects section on CMM D6/D8, not a freestanding thesis.
  • See Gaps Index for related open questions.