METR (Model Evaluation and Threat Research)

Independent AI evaluation organization. The wiki’s methodological foundation for long-task autonomy claims — the work that established the “task-completion-horizon doubles every N months” framing that UK AISI’s 8-month cyber-task figure is built on.

Notable outputs

“Measuring AI Ability to Complete Long Tasks” (metr.org blog, arXiv:2503.14499) — Generalist task horizon doubles every ~7 months across 2019–2025; accelerated to ~4 months in 2024–2025. Methodologically transparent.
Time Horizon 1.1 (metr.org/blog/2026-1-29) — January 2026 refresh of the doubling-time analysis.
Common Elements of Frontier AI Safety Policies (metr.org/common-elements) — Comparative view of vendor responsible-update commitments; cross-referenced from the wiki’s Threat Classes 2026 §Class 4 (model-version regression).

Why it matters for the wiki

UK AISI’s “8-month cyber-doubling” figure is built on METR methodology. Citing UK AISI without METR is citing the conclusion without the foundation. METR is also the only independent (non-vendor, non-government) source for cross-model capability scaling.

Enterprise Security in the Agentic AI Era

Explorer

METR (Model Evaluation and Threat Research)

METR (Model Evaluation and Threat Research)

Notable outputs

Why it matters for the wiki

See Also

Graph View

Table of Contents

Backlinks