Claude Code Security

Sources: Announcement (Feb 20, 2026) · Product page · Application access

What

Anthropic’s defender-first vulnerability-discovery capability, built into Claude Code on the web. The capability reads and reasons about codebases the way a human security researcher would — “understanding how components interact, tracing how data moves through your application, and catching complex vulnerabilities that rule-based tools miss” — and runs a multi-stage self-critique verification step in which Claude attempts to prove or disprove its own findings to filter false positives. Validated findings appear in the Claude Code Security dashboard with severity and confidence ratings and suggested patches; nothing is applied without human approval.

Limited research preview for Anthropic Enterprise and Team customers; expedited free access for open-source maintainers explicitly committed. Underlying model: Claude Opus 4.6 (released February 2026).

Relevance to This Wiki

Sixth sourced production path on the ai-vuln-discovery axis as of 2026-05-15 — the Anthropic-side commercial private-preview entry, paired with Codex Security (OpenAI commercial preview) and OpenAnt (Knostic OSS). The product’s load-bearing FP-control mechanism — “Claude re-examines each result, attempting to prove or disprove its own findings and filter out false positives” — is the self-critique instance of the same discipline that OpenAnt implements as constrained-attacker-persona, that Aardvark implements as sandboxed exploit-trigger validation, and that MDASH implements as a prover stage. Four mechanism instances, same architectural commitment. The product is on the wiki primarily as second-Anthropic posture on the axis (alongside Project Glasswing coalition organizing and Mythos frontier model preview).

Outputs / Numbers

  • 500+ vulnerabilities in production OSS codebases found by Anthropic’s Frontier Red Team using Claude Opus 4.6 — “bugs that had gone undetected for decades, despite years of expert review.” This is the capability anchor for Claude Code Security as productized capability; the 500+ count is the FRT research-side result, not the Claude-Code-Security-product-side recall.
  • Severity + confidence rating per finding. The confidence rating is a triage primitive recognizing that “these issues often involve nuances that are difficult to assess from source code alone.”
  • Multi-stage self-critique verification before any finding reaches an analyst.
  • No public benchmark recall number.

Notable Design Choices

  • Self-critique as the verification primitive. “Claude re-examines each result, attempting to prove or disprove its own findings.” Mechanism: model-vs-itself adversarial loop. This is structurally distinct from constrained-persona (OpenAnt), sandbox-execution-trigger (Aardvark), and ensemble-prover (MDASH) — same discipline, fourth distinct mechanism.
  • Built on Claude Code on the web. Inherits dev-workflow integration without a new platform. Teams review findings and iterate on fixes in the tools they already use.
  • Confidence rating as a first-class output. Severity is the obvious dimension; confidence is the more novel one — explicit recognition that AI-finding quality varies and analyst triage benefits from per-finding confidence signal.
  • OSS maintainer expedited-access track. Explicit commitment to make this capability available to OSS maintainers at no cost via an expedited application path. Aligned with the Glasswing $4M OSS donations posture but applied at the product level rather than at the coalition level.
  • Methodological frame rejects rule-based SAST. “Rather than scanning for known patterns, Claude Code Security reads and reasons about your code the way a human security researcher would.” Convergent with Aardvark’s and OpenAnt’s framing.
  • Human approval gating. “Nothing is applied without human approval: Claude Code Security identifies problems and suggests solutions, but developers always make the call.”

Adjacent Gaps

  • No public benchmark. Recall not quantified; comparisons across vendors (MDASH 88.45% CyberGym, Aardvark 92% internal golden, OpenAnt no published recall) require a common third-party benchmark that does not yet exist.
  • No sandboxed dynamic execution disclosed. The verification is described as self-critique; whether sandbox-execution validation is in scope is unclear. OpenAnt (Stage 6) and Aardvark (Validation stage) both describe sandboxed exploit triggering; CCS does not.
  • Cost shape bundled into Enterprise + Team subscription; not bounded for OSS use.
  • FRT zero-days post not yet ingested. red.anthropic.com/2026/zero-days/ is the load-bearing capability anchor; already on the wiki’s gap list.