MDASH — Microsoft Multi-Model Agentic Scanning Harness

Sources: Microsoft Security Blog — MDASH announcement (May 12, 2026) · Private Preview Signup (aka.ms)

MDASH (codename — multi-model agentic scanning harness) is Microsoft’s agentic vulnerability discovery and remediation system, built by Microsoft’s Autonomous Code Security (ACS) team in collaboration with Microsoft Windows Attack Research and Protection (WARP). It orchestrates 100+ specialized AI agents across an ensemble of frontier and distilled models — auditors, debaters, dedup agents, and provers — to find, validate, and prove exploitable vulnerabilities end-to-end. First publicly announced May 12, 2026; in limited private preview as of mid-May 2026; product naming uses “codename MDASH” — the GA product name is not yet disclosed.

Pipeline Architecture

Five-stage pipeline; targeting / validation / dedup / prove stages are model-agnostic by construction:

StageRoleOutput
PrepareIngests source target; builds language-aware indices; draws attack-surface and threat models from past commits.Indices, attack-surface model
ScanSpecialized auditor agents reason over candidate code paths.Candidate findings with hypotheses and evidence
ValidateDebater agents argue for and against each finding’s reachability and exploitability.Surviving findings with strengthened rationale
DedupCollapses semantically equivalent findings (patch-based grouping).Distinct vulnerability candidates
ProveConstructs and executes triggering inputs where the bug class admits (e.g., ASan in C/C++).Validated PoC artifacts

Three Architectural Properties

  1. Ensemble of diverse models. SOTA as heavy reasoner; distilled models as cost-effective debater for high-volume passes; a second separate SOTA model as independent counterpoint. Disagreement between models is itself a signal. Microsoft does not disclose which specific models occupy these roles.
  2. Specialized agents. 100+ agents, each with its own role, prompt regime, tools, and stop criteria. Auditors do not reason like debaters; debaters do not reason like provers. Constructed through deep research with past CVEs and their patches.
  3. End-to-end pipeline with extensible plugins. Domain plugins inject context the foundation models cannot see — kernel calling conventions, IRP rules, lock invariants, IPC trust boundaries, codec state machines, custom CodeQL databases. The CLFS proving plugin is a worked example, embedding on-disk container layout + block-validation sequence + in-memory state machine to construct triggering log files for candidate findings.

Public Benchmark Results

BenchmarkResultProvenance
StorageDrive (21 planted vulns, private Microsoft interview driver)21/21 found, 0 false positivesMicrosoft-disclosed
clfs.sys historical recall (5 years, 28 MSRC cases)96%Microsoft-internal
tcpip.sys historical recall (5 years, 7 MSRC cases)100%Microsoft-internal
CyberGym public leaderboard (1,507 real-world vuln-repro tasks, level 1)88.45%Top score, ~5 points above #2 (83.1%)
May 2026 Patch Tuesday16 new CVEs10 kernel-mode, 6 usermode; 4 Critical RCEs

The CyberGym number is the only independently verifiable data point; the others are Microsoft-internal but anchored to a defensible ground truth (MSRC case database).

Positioning

MDASH sits at the intersection of three wiki scope axes:

  • ai-vuln-discovery (primary): second sourced anchor on the axis after XBOW/Mythos. The convergence with XBOW — two vendors, opposite sides of the stack, arriving at “the harness does the work, the model is one input” — is itself the load-bearing observation.
  • ai-in-sec-defense (primary): Microsoft’s defender-AI capability at the AppSec / vulnerability-research layer, distinct from Security Copilot at the SOC layer.
  • sec-of-ai (tertiary): MDASH is itself an agentic system; the CMM questions about agent identity, action authority, and audit apply to MDASH’s 100+ agents.

Convergence with XBOW + Mythos

XBOW + MythosMDASH
OrientationOffensive (live-site pentest)Defensive (developer-side audit)
Model strategySingle best frontier model, strong harnessEnsemble (SOTA + distilled + second-SOTA counterpoint)
ValidationLive-site interaction harnessDebate + dedup + automated prove (PoC construction)
TargetsLive web applicationsNative Windows components (tcpip.sys, ikeext.dll, etc.)
Key insight (paraphrased)“A model is a brain without a body""The harness does the work; the model is one input”
Public benchmarkXBOW’s internal web exploit benchmarkCyberGym public leaderboard
PricingMythos ~5× Opus at GANot disclosed; private preview

The two systems are architecturally distinct but converge on the same architectural argument — orchestration outperforms model choice, and validation infrastructure determines real-world utility.

Distribution

  • Status: limited private preview as of May 2026.
  • Users: Microsoft security engineering teams (production); a small set of customers (preview).
  • Signup: aka.ms/AI-drivenScanningHarness.
  • GA timeline / pricing / SKU positioning: not yet disclosed.

Open Questions

  • Which models? “Generally available AI models” is the only public attribution. Anthropic’s Mythos is a plausible SOTA-reasoner candidate; OpenAI GPT and Microsoft-internal models are alternatives. Microsoft’s silence is conspicuous.
  • GA productization: standalone product, feature within Defender / Security Copilot, or service offering via consulting? The post does not commit.
  • Customer co-pilot pattern: the post says customers “test” MDASH — implies hosted-service model rather than on-prem deployment. To be confirmed.
  • CyberGym configuration: 88.45% is at level 1 (vulnerable source + high-level description supplied). Performance on higher-difficulty levels (blind discovery) would be a stronger signal.
  • Relationship to MITRE ATLAS or other vulnerability taxonomies: not addressed.

See Also