How Long Reasoning Chains Influence LLMs’ Judgment of Answer Factuality

Minzhu Tu; Shiyu Ni; Keping Bi

How Long Reasoning Chains Influence LLMs’ Judgment of Answer Factuality

Abstract

Large language models (LLMs) are increasingly adopted as scalable judges for open-ended generation, yet how they form judgments remains insufficiently understood. Meanwhile, modern LLMs frequently produce answers accompanied by explicit reasoning, making reasoning chains a natural but understudied source of information for model-based evaluation. This work takes a first step toward understanding how exposing reasoning influences LLM-based judgment. Empirical results across factual question-answering (QA) and mathematical datasets show that the presence of reasoning substantially alters judgment behavior, with clear differences across judge capabilities. Weaker judges become more likely to accept incorrect answers when reasoning is present, suggesting over-reliance on persuasive explanations. In contrast, stronger judges exhibit more selective behavior and, in some cases, achieve higher judgment accuracy by leveraging reasoning content. Further analysis reveals that both reasoning fluency and factuality critically shape judgment outcomes. Together, these findings suggest that examining how models interpret reasoning is essential for understanding and improving LLM-based evaluation, with broader implications for the design of reliable automatic judges and evaluation protocols.

Anthology ID:: 2026.acl-long.2082
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 44957–44972
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2082/
DOI:
Bibkey:
Cite (ACL):: Minzhu Tu, Shiyu Ni, and Keping Bi. 2026. How Long Reasoning Chains Influence LLMs’ Judgment of Answer Factuality. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44957–44972, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: How Long Reasoning Chains Influence LLMs’ Judgment of Answer Factuality (Tu et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2082.pdf
Checklist:: 2026.acl-long.2082.checklist.pdf

PDF Cite Search Checklist Fix data