Towards Improved Multi-Source Attribution for Long-Form Answer Generation

Nilay Patel, Shivashankar Subramanian, Siddhant Garg, Pratyay Banerjee, Amita Misra


Abstract
Teaching large language models (LLMs) to generate text with attribution to evidence sources can reduce hallucinations, improve verifiability in question answering systems (QA), and increase reliability of retrieval augmented LLMs. Despite gaining increasing popularity for usage in QA systems and search engines, current LLMs struggle with attribution for long-form responses which require reasoning over multiple evidence sources. To address this, in this paper we aim to improve the attribution capability of LLMs for long-form answer generation to multiple sources, with multiple citations per sentence. However, data for training multi-source attributable QA systems is difficult and expensive to annotate, and therefore scarce. To overcome this challenge, we transform existing QA datasets for this task (MultiAttr), and empirically demonstrate, on a wide range of attribution benchmark datasets, that fine-tuning on MultiAttr provides significant improvements over training only on the target QA domain. Lastly, to fill a gap in existing benchmarks, we present a multi-source attribution dataset containing multi-paragraph answers, PolitiICite, based on PolitiFact articles that discuss events closely related to implementation statuses of election promises.
Anthology ID:
2024.naacl-long.216
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3906–3919
Language:
URL:
https://aclanthology.org/2024.naacl-long.216
DOI:
Bibkey:
Cite (ACL):
Nilay Patel, Shivashankar Subramanian, Siddhant Garg, Pratyay Banerjee, and Amita Misra. 2024. Towards Improved Multi-Source Attribution for Long-Form Answer Generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3906–3919, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Towards Improved Multi-Source Attribution for Long-Form Answer Generation (Patel et al., NAACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2024.naacl-long.216.pdf
Copyright:
 2024.naacl-long.216.copyright.pdf