Localizing and Mitigating Errors in Long-form Question Answering

Rachneet Singh Sachdeva, Yixiao Song, Mohit Iyyer, Iryna Gurevych


Abstract
Long-form question answering (LFQA) aims to provide thorough and in-depth answers to complex questions, enhancing comprehension. However, such detailed responses are prone to hallucinations and factual inconsistencies, challenging their faithful evaluation. This work introduces HaluQuestQA, the first hallucination dataset with localized error annotations for human-written and model-generated LFQA answers. HaluQuestQA comprises 698 QA pairs with 1.8k span-level error annotations for five different error types by expert annotators, along with preference judgments. Using our collected data, we thoroughly analyze the shortcomings of long-form answers and find that they lack comprehensiveness and provide unhelpful references. We train an automatic feedback model on this dataset that predicts error spans with incomplete information and provides associated explanations. Finally, we propose a prompt-based approach, Error-Informed Refinement, that uses signals from the learned feedback model to refine generated answers, which we show reduces errors and improves the quality of the answers across multiple models. Furthermore, humans find the answers generated by our approach comprehensive and highly prefer them (84%) over the baseline answers.
Anthology ID:
2025.findings-acl.1049
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20437–20469
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.1049/
DOI:
10.18653/v1/2025.findings-acl.1049
Bibkey:
Cite (ACL):
Rachneet Singh Sachdeva, Yixiao Song, Mohit Iyyer, and Iryna Gurevych. 2025. Localizing and Mitigating Errors in Long-form Question Answering. In Findings of the Association for Computational Linguistics: ACL 2025, pages 20437–20469, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Localizing and Mitigating Errors in Long-form Question Answering (Sachdeva et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.1049.pdf