MDASH — Microsoft Multi-Model Agentic Scanning Harness

Sources: Microsoft Security Blog — MDASH announcement (May 12, 2026) · Private Preview Signup (aka.ms)

MDASH (codename — multi-model agentic scanning harness) is Microsoft’s agentic vulnerability discovery and remediation system, built by Microsoft’s Autonomous Code Security (ACS) team in collaboration with Microsoft Windows Attack Research and Protection (WARP). It orchestrates 100+ specialized AI agents across an ensemble of frontier and distilled models — auditors, debaters, dedup agents, and provers — to find, validate, and prove exploitable vulnerabilities end-to-end. First publicly announced May 12, 2026; in limited private preview as of mid-May 2026; product naming uses “codename MDASH” — the GA product name is not yet disclosed.

Pipeline Architecture

Five-stage pipeline; targeting / validation / dedup / prove stages are model-agnostic by construction:

Stage	Role	Output
Prepare	Ingests source target; builds language-aware indices; draws attack-surface and threat models from past commits.	Indices, attack-surface model
Scan	Specialized auditor agents reason over candidate code paths.	Candidate findings with hypotheses and evidence
Validate	Debater agents argue for and against each finding’s reachability and exploitability.	Surviving findings with strengthened rationale
Dedup	Collapses semantically equivalent findings (patch-based grouping).	Distinct vulnerability candidates
Prove	Constructs and executes triggering inputs where the bug class admits (e.g., ASan in C/C++).	Validated PoC artifacts

Three Architectural Properties

Ensemble of diverse models. SOTA as heavy reasoner; distilled models as cost-effective debater for high-volume passes; a second separate SOTA model as independent counterpoint. Disagreement between models is itself a signal. Microsoft does not disclose which specific models occupy these roles.
Specialized agents. 100+ agents, each with its own role, prompt regime, tools, and stop criteria. Auditors do not reason like debaters; debaters do not reason like provers. Constructed through deep research with past CVEs and their patches.
End-to-end pipeline with extensible plugins. Domain plugins inject context the foundation models cannot see — kernel calling conventions, IRP rules, lock invariants, IPC trust boundaries, codec state machines, custom CodeQL databases. The CLFS proving plugin is a worked example, embedding on-disk container layout + block-validation sequence + in-memory state machine to construct triggering log files for candidate findings.

Public Benchmark Results

Benchmark	Result	Provenance
StorageDrive (21 planted vulns, private Microsoft interview driver)	21/21 found, 0 false positives	Microsoft-disclosed
`clfs.sys` historical recall (5 years, 28 MSRC cases)	96%	Microsoft-internal
`tcpip.sys` historical recall (5 years, 7 MSRC cases)	100%	Microsoft-internal
CyberGym public leaderboard (1,507 real-world vuln-repro tasks, level 1)	88.45%	Top score, ~5 points above #2 (83.1%)
May 2026 Patch Tuesday	16 new CVEs	10 kernel-mode, 6 usermode; 4 Critical RCEs

The CyberGym number is the only independently verifiable data point; the others are Microsoft-internal but anchored to a defensible ground truth (MSRC case database).

Positioning

MDASH sits at the intersection of three wiki scope axes:

ai-vuln-discovery (primary): second sourced anchor on the axis after XBOW/Mythos. The convergence with XBOW — two vendors, opposite sides of the stack, arriving at “the harness does the work, the model is one input” — is itself the load-bearing observation.
ai-in-sec-defense (primary): Microsoft’s defender-AI capability at the AppSec / vulnerability-research layer, distinct from Security Copilot at the SOC layer.
sec-of-ai (tertiary): MDASH is itself an agentic system; the CMM questions about agent identity, action authority, and audit apply to MDASH’s 100+ agents.

Convergence with XBOW + Mythos

	XBOW + Mythos	MDASH
Orientation	Offensive (live-site pentest)	Defensive (developer-side audit)
Model strategy	Single best frontier model, strong harness	Ensemble (SOTA + distilled + second-SOTA counterpoint)
Validation	Live-site interaction harness	Debate + dedup + automated prove (PoC construction)
Targets	Live web applications	Native Windows components (`tcpip.sys`, `ikeext.dll`, etc.)
Key insight (paraphrased)	“A model is a brain without a body"	"The harness does the work; the model is one input”
Public benchmark	XBOW’s internal web exploit benchmark	CyberGym public leaderboard
Pricing	Mythos ~5× Opus at GA	Not disclosed; private preview

The two systems are architecturally distinct but converge on the same architectural argument — orchestration outperforms model choice, and validation infrastructure determines real-world utility.

Distribution

Status: limited private preview as of May 2026.
Users: Microsoft security engineering teams (production); a small set of customers (preview).
Signup: aka.ms/AI-drivenScanningHarness.
GA timeline / pricing / SKU positioning: not yet disclosed.

Open Questions

Which models? “Generally available AI models” is the only public attribution. Anthropic’s Mythos is a plausible SOTA-reasoner candidate; OpenAI GPT and Microsoft-internal models are alternatives. Microsoft’s silence is conspicuous.
GA productization: standalone product, feature within Defender / Security Copilot, or service offering via consulting? The post does not commit.
Customer co-pilot pattern: the post says customers “test” MDASH — implies hosted-service model rather than on-prem deployment. To be confirmed.
CyberGym configuration: 88.45% is at level 1 (vulnerable source + high-level description supplied). Performance on higher-difficulty levels (blind discovery) would be a stronger signal.
Relationship to MITRE ATLAS or other vulnerability taxonomies: not addressed.

Enterprise Security in the Agentic AI Era

Explorer

MDASH — Microsoft Multi-Model Agentic Scanning Harness

MDASH — Microsoft Multi-Model Agentic Scanning Harness

Pipeline Architecture

Three Architectural Properties

Public Benchmark Results

Positioning

Convergence with XBOW + Mythos

Distribution

Open Questions

See Also

Graph View

Table of Contents

Backlinks