Promptfoo — LLM evaluation and red-teaming framework

Open-source LLM evaluation and red-teaming framework that runs YAML-defined test suites in CI to catch prompt regressions, vulnerability findings, and behavioral drift. The wiki’s CMM cites Promptfoo as the “regression suite” attack category in the D7 L4 four-quadrant red-team coverage requirement, and as the empirical source for the model-version-degradation finding in Threat Classes Class 4.

Acquisition

The Promptfoo public site banner reads “Promptfoo is now part of OpenAI” (2026). The project remains MIT-licensed and the framework continues to ship; the wiki’s vendor-neutrality framing in CMM D7 L4 should note the new organizational home.

What it does

CapabilityDetail
YAML-defined evalsAssertion-driven test cases; model + RAG comparisons; factuality scoring; hallucination tests
Red-team plugins50+ vulnerability types — direct + indirect prompt injection, guardrail-tailored jailbreaks, PII/data leaks, business-rule violations, insecure tool use, BOLA/BFLA authorization tests, competitor-endorsement, harmful-content
Standards mappingPlugins published with OWASP LLM Top 10 and NIST AI RMF mappings
CI integrationGitHub / GitLab / Jenkins; output diffs surface as PR comments

The regression-suite framing is correct: the same YAML eval set re-run on a new model version or prompt change is the regression check. This is what distinguishes Promptfoo from PyRIT (orchestration) and Garak (probe library).

The model-upgrade-regression research

Promptfoo’s blog post “Your model upgrade just broke your agent’s safety” (Guangshuo Zang, Dec 8 2025) is the wiki’s primary empirical source for Threat Class 4 (model-version-degradation). Headline findings, all directly quoted from the post:

ClaimNumber
GPT-4o → GPT-4.1 prompt-injection resistancedropped 94% → 71%
Anthropic Constitutional Classifiers, jailbreak successreduced 86% → 4.4%
GPT-4o-mini AgentHarm62.5% harm with only 22% refusal
Gemini 1.5 Pro refusal under jailbreakdropped 78.4% → 3.5%
Crescendo vs single-turn jailbreaksbeats by 29–61% on GPT-4, 49–71% on Gemini-Pro
BadLlama vs Llama 3 8B safetystrips it in ~1–5 minutes

Recommended fixes from the post: pin model IDs (no latest), re-run prompt-injection + tool-abuse tests on every upgrade, add app-layer guardrails, least-privilege execution credentials, monitor injection signals. “Treat model upgrades as security changes, not just quality upgrades.”

Direct quotes

  • “Your model upgrade just broke your agent’s safety.” — verbatim post title
  • “Treat model upgrades as security changes, not just quality upgrades.” — same post

How the wiki uses it

  • CMM D7 L4 — regression-suite red-team category
  • Threat Classes 2026 §Class 4 — primary empirical anchor for model-version-degradation
  • Validation page §5 #7 — flagged the “single-tool coverage is not L4” rule that depends on Promptfoo + Garak + PyRIT + Mindgard being treated as distinct attack categories

Caveats

  • No version number on the landing page; pin via npm/PyPI in any wiki citation.
  • Acquisition by OpenAI changes the vendor-neutrality posture — peer reviewers may push on whether OpenAI-owned tooling can be used for objective evaluation of OpenAI models.
  • BOLA/BFLA plugins are the most explicit agentic-tool red-team primitives in the open-source set, but agentic / multi-step coverage still trails commercial CART offerings.

See Also