Quantifying the Influence of Evaluation Aspects on Long-Form Response Assessment

Go Kamoda; Akari Asai; Ana Brassard; Keisuke Sakaguchi

Quantifying the Influence of Evaluation Aspects on Long-Form Response Assessment

Go Kamoda, Akari Asai, Ana Brassard, Keisuke Sakaguchi

Abstract

Evaluating the outputs of large language models (LLMs) on long-form generative tasks remains challenging. While fine-grained, aspect-wise evaluations provide valuable diagnostic information, they are difficult to design exhaustively, and each aspect’s contribution to the overall acceptability of an answer is unclear. In this study, we propose a method to compute an overall quality score as a weighted average of three key aspects: factuality, informative- ness, and formality. This approach achieves stronger correlations with human judgments compared to previous metrics. Our analysis identifies factuality as the most predictive aspect of overall quality. Additionally, we release a dataset of 1.2k long-form QA answers annotated with both absolute judgments and relative preferences in overall and aspect-wise schemes to aid future research in evaluation practices.

Anthology ID:: 2025.coling-main.588
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8787–8808
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.coling-main.588/
DOI:
Bibkey:
Cite (ACL):: Go Kamoda, Akari Asai, Ana Brassard, and Keisuke Sakaguchi. 2025. Quantifying the Influence of Evaluation Aspects on Long-Form Response Assessment. In Proceedings of the 31st International Conference on Computational Linguistics, pages 8787–8808, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Quantifying the Influence of Evaluation Aspects on Long-Form Response Assessment (Kamoda et al., COLING 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.coling-main.588.pdf

PDF Cite Search Fix data