Data-Efficient Auto-Regressive Document Retrieval for Fact Verification

James Thorne


Abstract
Document retrieval is a core component of many knowledge-intensive natural language processing task formulations such as fact verification. Sources of textual knowledge such as Wikipedia articles condition the generation of answers from the models. Recent advances in retrieval use sequence-to-sequence models to incrementally predict the title of the appropriate Wikipedia page given an input instance. However, this method requires supervision in the form of human annotation to label which Wikipedia pages contain appropriate context.This paper introduces a distant-supervision method that does not require any annotation train auto-regressive retrievers that attain competitive R-Precision and Recall in a zero-shot setting.Furthermore we show that with task-specific supervised fine-tuning, auto-regressive retrieval performance for two Wikipedia-based fact verification tasks can approach or even exceed full supervision using less than 1/4 of the annotated data. We release all code and models
Anthology ID:
2022.sustainlp-1.7
Volume:
Proceedings of The Third Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Venue:
sustainlp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–51
Language:
URL:
https://aclanthology.org/2022.sustainlp-1.7
DOI:
10.18653/v1/2022.sustainlp-1.7
Bibkey:
Cite (ACL):
James Thorne. 2022. Data-Efficient Auto-Regressive Document Retrieval for Fact Verification. In Proceedings of The Third Workshop on Simple and Efficient Natural Language Processing (SustaiNLP), pages 44–51, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Data-Efficient Auto-Regressive Document Retrieval for Fact Verification (Thorne, sustainlp 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2022.sustainlp-1.7.pdf
Video:
 https://preview.aclanthology.org/remove-xml-comments/2022.sustainlp-1.7.mp4