Evaluating Evidence Attribution in Generated Fact Checking Explanations

Rui Xing, Timothy Baldwin, Jey Han Lau


Abstract
Automated fact-checking systems often struggle with trustworthiness, as their generated explanations can include hallucinations. In this work, we explore evidence attribution for fact-checking explanation generation. We introduce a novel evaluation protocol, citation masking and recovery, to assess attribution quality in generated explanations. We implement our protocol using both human annotators and automatic annotators and found that LLM annotation correlates with human annotation, suggesting that attribution assessment can be automated. Finally, our experiments reveal that: (1) the best-performing LLMs still generate explanations that are not always accurate in their attribution; and (2) human-curated evidence is essential for generating better explanations.
Anthology ID:
2025.naacl-long.282
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5475–5496
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.282/
DOI:
Bibkey:
Cite (ACL):
Rui Xing, Timothy Baldwin, and Jey Han Lau. 2025. Evaluating Evidence Attribution in Generated Fact Checking Explanations. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5475–5496, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Evaluating Evidence Attribution in Generated Fact Checking Explanations (Xing et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.282.pdf