Claude Mythos Preview (Anthropic)

Sources: Anthropic Project Glasswing landing page · Claude Mythos Preview system card · XBOW’s Mythos Evaluation (May 2026)

Claude Mythos Preview is Anthropic’s unreleased frontier model — a general-purpose foundation model whose coding and reasoning capabilities translate to outsized strength at finding and exploiting software vulnerabilities. Announced May 12, 2026 alongside Project Glasswing (Anthropic’s twelve-partner coalition initiative applying Mythos to defensive cybersecurity on critical software). Anthropic explicitly states: “We do not plan to make Claude Mythos Preview generally available.” Access is preview-only, distributed via Glasswing partners and approved organizations.

Mythos pricing — XBOW vs Anthropic

XBOW’s May 2026 evaluation cited Anthropic as saying Mythos would be “5× as expensive as an Opus model” at GA. The Glasswing landing page is authoritative: Mythos is not planned for general availability; preview pricing for Glasswing participants is 125 per million input / output tokens — approximately 1.67× Opus 4.6 (75), not 5×. XBOW’s source may have been a verbal description, a different pricing model, or a hypothetical GA scenario that was subsequently revised. The wiki should treat the Glasswing-direct numbers as the source of truth.

Capability Profile (as of May 2026 preview)

Public benchmarks (Anthropic-disclosed via Glasswing announcement)

BenchmarkMythos PreviewOpus 4.6
CyberGym (1,507 real-world vuln-repro tasks)83.1%66.6%
SWE-bench Pro77.8%53.4%
SWE-bench Multilingual87.3%77.8%
SWE-bench Verified93.9%80.8%
Terminal-Bench 2.082.0% (92.1% at 4hr)65.4%
GPQA Diamond94.6%91.3%
BrowseComp86.9% (4.9× fewer tokens)83.7%
OSWorld-Verified79.6%72.7%
Humanity’s Last Exam (no tools)56.8%40.0%

Raw Mythos vs MDASH-orchestrated Mythos

Mythos Preview scored 83.1% on CyberGym. MDASH (Microsoft’s multi-model agentic harness, which orchestrates Mythos among other models) scored 88.45% on the same benchmark. MDASH’s harness adds ~5 percentage points over raw Mythos — the clearest quantitative measurement of the “harness over model” architectural argument from both XBOW and Microsoft.

XBOW-orchestrated profile (offensive use, May 12 2026 evaluation)

  • Source-code reasoning: strongest mode. 42% reduction in false negatives vs Opus 4.6 (no source); 55% reduction when source is provided. The recurring evaluation theme: “impressive at writing code, but even more impressive at reading it.”
  • Native-code and reverse engineering: substantial strength. Found real bugs in Chromium and V8 sandbox contexts where prior baselines produced findings without successful validations; reasoned through unusual firmware/embedded contexts.
  • Live-site interaction: degrades performance more than removing source-code access — Mythos is most effective when paired with orchestration that supplies live-site behavior (XBOW’s wedge).
  • Browser interaction and visual acuity: roughly matches Sonnet 4.6; dramatically outperforms Opus 4.6. Practically effective at UI-element identification but not pixel-accurate for exact coordinates.
  • Judgment: mixed. On XBOW’s command-safety benchmark Mythos scored 77.8%, below Opus 4.6 (81.2%) and Haiku 4.5 (90.1%). The model is literal-conservative: it prioritizes the letter of rules over the spirit.

Disclosed real-world findings (via Glasswing)

  • Thousands of high-severity vulnerabilities found across every major operating system and every major web browser.
  • 27-year-old OpenBSD vulnerability — remote crash via network connection.
  • 16-year-old FFmpeg vulnerability — in a code path hit 5 million times by automated testing tools without detection.
  • Linux kernel privilege escalation — autonomously chained multiple vulnerabilities to escalate from ordinary user to complete machine control.
  • “Nearly all of these vulnerabilities — and develop many related exploits — entirely autonomously, without any human steering.”
  • All cited examples patched; remaining undisclosed vulnerabilities are tracked via cryptographic hashes on the Anthropic Frontier Red Team blog.

Positioning

Mythos sits at the intersection of three wiki scope axes:

  • ai-vuln-discovery: primary anchor. Mythos is the first frontier model with sourced, third-party evaluation showing a quantifiable advance in vulnerability discovery. Promotes the thesis from seed to developing.
  • ai-in-sec-offense: secondary. As operationalized by XBOW for live-site exploitation, Mythos enters the offensive-AI tool category.
  • sec-of-ai: tertiary. Mythos is itself an AI system that must be governed; the CMM D3/D4 questions about safe model deployment apply.

Distribution

  • Preview-only. Anthropic states: “We do not plan to make Claude Mythos Preview generally available.” The intended GA path is through a future Claude Opus successor with refined safeguards.
  • Glasswing-participant pricing: 125 per million input / output tokens (~1.67× Opus 4.6). Available on Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
  • Anthropic credit commitment: up to $100M in usage credits for Glasswing partners and extended-access organizations.
  • Access partners: the Project Glasswing coalition (12 named partners: AWS, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks) plus 40+ additional organizations that build or maintain critical software infrastructure. XBOW is independently named as an evaluator.
  • OSS maintainer access: Claude for Open Source program.

Adjacent Frontier Models

ModelVendorRole in Mythos evaluation
Opus 4.6AnthropicDirect predecessor baseline (42–55% FN reductions are vs Opus 4.6)
Opus 4.7AnthropicExcluded from web exploit chart due to “unique interaction” — see XBOW’s Opus 4.7 First Look. Matches Mythos on visual acuity.
Sonnet 4.6AnthropicVisual-acuity peer of Mythos
Haiku 4.5AnthropicCommand-safety leader (90.1%) — important for judgment-heavy guardrail tasks
GPT 5.5OpenAICost-normalized competitor. AISI benchmarked Mythos vs GPT 5.5 per Point Estimate analysis.

CMM / RA Maps-to

  • CMM D7 L4–L5 — frontier-model-driven discovery is a continuous-adversarial primitive when oriented defensively (Anthropic Glasswing) or offensively (XBOW). The wiki’s existing four-quadrant red-team grid extends naturally.
  • CMM D8 L5+ — Mythos’s 5×-Opus pricing and preview-only access make it a procurement-and-vendor-evaluation problem, not just a capability question.

Open Questions

  • Anthropic’s offensive-use policy stance: the Glasswing partnership signals defender framing; XBOW’s commercial offensive deployment appears tolerated under existing usage policies. Anthropic has not (in the sources reviewed) addressed the policy boundary explicitly. Worth tracking.
  • Independent reproduction: XBOW’s benchmark numbers are vendor-evaluated by a commercial beneficiary. AISI’s evaluation (referenced via Point Estimate) is the candidate independent comparator; need to source that directly.
  • “Mythos” as canonical product name: XBOW’s post differentiates “Mythos the raw model” from “Mythos inside Claude Code.” Anthropic’s official productization, naming, and SKU structure are not yet sourced.

See Also