CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs

Siyi Li; Jiajun Shi; Shiwen Ni; Ge Zhang; Shuaimin Li; Shijian Wang; Zhoufutu Wen; Yizhi Li; Hamid Alinejad-Rokny; Jiaheng Liu; Min Yang; Wenhao Huang

CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs

Siyi Li, Jiajun Shi, Shiwen Ni, Ge Zhang, Shuaimin Li, Shijian Wang, Zhoufutu Wen, Yizhi LI, Hamid Alinejad-Rokny, Jiaheng Liu, Min Yang, Wenhao Huang

Abstract

Large Reasoning Models (LRMs) have demonstrated strong performance by producing extended Chain-of-Thought (CoT) traces before answering. However, this paradigm often induces over-reasoning: redundant calculations and circular self-verification that increase computational cost without improving outcomes. Existing evaluations largely emphasize final accuracy or coarse token counts, and lack automated tools to separate essential logic from structural redundancy. We introduce CoTJudger, a graph-driven framework that quantifies reasoning efficiency by converting free-form CoTs into directed dependency graphs and extracting the Shortest Effective Path (SEP) needed to reach a correct solution. This yields an interpretable efficiency signal – how much of a CoT is necessary versus structurally redundant – that is comparable across models and tasks. Evaluating 21 LRMs, CoTJudger reveals pervasive redundancy and surfaces recurring failure modes, including verification obsession and compensatory redundancy. These results provide a practical metric for disentangling reasoning ability from computational waste, enabling more targeted evaluation and diagnosis of LRM efficiency.

Anthology ID:: 2026.findings-acl.2077
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41837–41863
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2077/
DOI:
Bibkey:
Cite (ACL):: Siyi Li, Jiajun Shi, Shiwen Ni, Ge Zhang, Shuaimin Li, Shijian Wang, Zhoufutu Wen, Yizhi LI, Hamid Alinejad-Rokny, Jiaheng Liu, Min Yang, and Wenhao Huang. 2026. CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41837–41863, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs (Li et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2077.pdf
Checklist:: 2026.findings-acl.2077.checklist.pdf

PDF Cite Search Checklist Fix data