How Does Response Length Affect Long-Form Factuality

James Xu Zhao, Jimmy Z.j. Liu, Bryan Hooi, See-Kiong Ng


Abstract
Large language models (LLMs) are widely used for long-form text generation. However, factual errors in the responses would undermine their reliability. Despite growing attention to LLM factuality, the effect of response length on factuality remains underexplored. In this work, we systematically investigate this relationship by first introducing an automatic and bi-level long-form factuality evaluation framework, which achieves high agreement with human annotations while being cost-effective. Using this framework, we conduct controlled experiments and find that longer responses exhibit lower factual precision, confirming the presence of length bias. To explain this phenomenon, we empirically examine three hypotheses: error propagation, long context, and facts exhaustion. Our results reveal that facts exhaustion, where the model gradually exhausts more reliable knowledge, is the primary cause of factual degradation, rather than the other two hypotheses.
Anthology ID:
2025.findings-acl.161
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3102–3125
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.161/
DOI:
Bibkey:
Cite (ACL):
James Xu Zhao, Jimmy Z.j. Liu, Bryan Hooi, and See-Kiong Ng. 2025. How Does Response Length Affect Long-Form Factuality. In Findings of the Association for Computational Linguistics: ACL 2025, pages 3102–3125, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
How Does Response Length Affect Long-Form Factuality (Zhao et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.161.pdf