AgentShield
Sources: GitHub — affaan-m/agentshield · npm — ecc-agentshield · ECC Tools GitHub App · Cerebral Valley × Anthropic Claude Code Hackathon
What
AgentShield is an open-source security scanner whose unit of analysis is the agent-configuration tree itself — ~/.claude/ or any .claude/-shaped directory containing hooks, MCP server manifests, subagents, slash commands, skill .md files, and CLAUDE.md instructions. 102 rules across five categories (secrets, permissions, hooks, MCP servers, agents) produce a 0–100 score with an A–F grade. Provenance-aware runtimeConfidence labels distinguish active runtime config from project-local overrides, template catalogs, docs examples, plugin manifests, and manifest-resolved non-shell hook implementations. Optional layered analyses: active prompt-injection probing (--injection), sandboxed hook execution (--sandbox), data-flow taint tracking (--taint), and a three-agent Auditor adversarial pipeline powered by Claude Opus 4.6 (--opus).
Relevance to This Wiki
First open-source scanner the wiki has cataloged whose unit of analysis is the Claude Code config surface itself. Operationalizes practices the wiki has previously discussed in the abstract: skill-marketplace controls and MCP-package provenance for Supply Chain Security for Agents, asset inventory and misconfiguration detection for AI-SPM, hardened single-agent runtime design for Agent Sandboxing (via the bundled MiniClaw runtime), and static recognition of the structural conditions for Lethal Trifecta composition (Bash(*) + curl ${file} + remote MCP transports).
Natural evidence instrument for CMM domains D3 (Control & Least-Agency via permission rules), D4 (Runtime & Guardrails via hook + agent + prompt-injection rules and the bundled MiniClaw sandboxed runtime), D5 (Egress & Network via MCP remote-transport rules and network-exposure rules), D8 (Supply Chain & AI-BOM via MCP-package provenance and skill-marketplace controls), and D9 (Operations & Human Factors via baseline-drift gate, organization-policy presets, and time-bound exception-lifecycle audit). See Harness Config as Supply-Chain Artifact and Control-Efficacy Gate for the two design disciplines parked as candidate CMM revision-pass additions.
Outputs
- CLI —
npm install -g ecc-agentshieldornpx ecc-agentshield scan. Subcommands:scan,init(secure baseline config),baseline write,evidence-pack verify,policy init,runtime install / status / repair,miniclaw start. - GitHub Action —
uses: affaan-m/agentshield@v1. Inputs:path,min-severity,fail-on-findings,format,sarif-output,baseline,save-baseline,policy,fail-on-policy. Outputs: score, grade, finding counts, SARIF path, baseline drift, policy status. Writes Markdown job summaries and emits GitHub annotations inline. - ECC plugin — distribution channel into the Everything Claude Code ecosystem.
- ECC Tools GitHub App — integrated scanning across a GitHub organization. Paid Pro tier ($19 / seat / month, Stripe-billed) for automated repo analysis.
- MiniClaw — bundled minimal sandboxed agent runtime exposed as a single HTTP endpoint and importable as
ecc-agentshield/miniclaw. Four independently enforced security layers (server / prompt-router / tool-whitelist / sandbox-FS), zero external runtime dependencies (Node built-ins only). - Output formats — terminal, JSON, Markdown, HTML, SARIF, evidence pack (deterministic, SHA-256-digested, redacted), and stable-fingerprint JSON remediation plan.
Notable Design Choices
- Provenance-aware false-positive control. Same rule, different score weight by source kind:
template-exampleanddocs-exampleat0.25×,project-local-optionalat0.75×,plugin-manifestat0.5×. Real secrets stay at full weight everywhere. Single template file capped at 10 deduction points per category so one catalog cannot dominate the grade. - Cross-file hook-manifest awareness. Settings-only
hooks-no-pretoolusesuppressed when a companionhooks/hooks.jsonmanifest declaresPreToolUsehooks. Manifest-referenced non-shell hook implementations discovered through indirection and analyzed under thehook-codeconfidence label. - Corpus-gate with prioritized improvement plan.
agentshield scan --corpus-gatefails CI if the built-in attack corpus (env proxy / DNS exfiltration / runtime import mutation / env-token exfiltration / credential-store access / clipboard access) is not fully detected, and produces a per-category / per-missing-rule improvement plan when it regresses. - Time-bound policy exceptions. Organization-policy files declare exceptions with
id,rule,owner,reason,expires_at,scope,ticket. The exception lifecycle audit surfaces total / active / expiring-within-7-days / expired exceptions so temporary waivers stay visible in branch-protection evidence. - Stable-fingerprint remediation plans. Plans omit raw evidence and fix-before/after values so they can be attached to CI tickets without copying token-shaped strings.
- Evidence pack redaction default-on. Local paths, usernames, emails, and token-shaped strings are redacted unless
--no-evidence-redactis set for private internal bundles. - MiniClaw narrowness. Single HTTP endpoint, isolated per-session FS, prompt-router strips 12+ injection-pattern categories (system-prompt overrides, identity reassignment, jailbreaks, data-exfil URLs, zero-width Unicode, base64 payloads), three-tier tool whitelist (Safe / Guarded / Restricted; restricted disabled by default), 10 KB request cap, 10 req/min/IP rate limit, 10 MB file cap, 5-min timeout, no network by default.
Adjacent Gaps
- Harness coverage beyond Claude Code. Rule corpus is tuned for
.claude/shape; other harnesses (OpenCode, Codex, Gemini, dmux, terminal-agent wrappers) currently surface only as local adapter-evidence markers. - Freeform
skill-mdprompt text has narrower coverage thanagent-md/CLAUDE.md. - Non-shell hook execution beyond the narrow
hook-codesignals (context injection, transcript access, child-process execution) still needs language-aware analysis. - No GA library API yet — the package root export is the CLI entrypoint, not a semver-stable scanner module. Consumers should treat JSON / SARIF reports as the supported automation surface.