Annotating Hallucinations in Question-Answering using Rewriting

Xu Liu, Guanyi Chen, Kees van Deemter, Tingting He


Abstract
Hallucinations pose a persistent challenge in open-ended question answering (QA). Traditional annotation methods, such as span-labelling, suffer from inconsistency and limited coverage. In this paper, we propose a rewriting-based framework as a new perspective on hallucinations in open-ended QA. We report on an experiment in which annotators are instructed to rewrite LLM-generated answers directly to ensure factual accuracy, with edits automatically recorded. Using the Chinese portion of the Mu-SHROOM dataset, we conduct a controlled rewriting experiment, comparing fact-checking tools (Google vs. GPT-4o), and analysing how tool choice, annotator background, and question openness influence rewriting behaviour. We find that rewriting leads to more hallucinations being identified, with higher inter-annotator agreement, than span-labelling.
Anthology ID:
2025.inlg-main.48
Volume:
Proceedings of the 18th International Natural Language Generation Conference
Month:
October
Year:
2025
Address:
Hanoi, Vietnam
Editors:
Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
823–832
Language:
URL:
https://preview.aclanthology.org/ingest-luhme/2025.inlg-main.48/
DOI:
Bibkey:
Cite (ACL):
Xu Liu, Guanyi Chen, Kees van Deemter, and Tingting He. 2025. Annotating Hallucinations in Question-Answering using Rewriting. In Proceedings of the 18th International Natural Language Generation Conference, pages 823–832, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):
Annotating Hallucinations in Question-Answering using Rewriting (Liu et al., INLG 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-luhme/2025.inlg-main.48.pdf
Supplementary attachment:
 2025.inlg-main.48.Supplementary_Attachment.zip