Auto-Encoding Questions with Retrieval Augmented Decoding for Unsupervised Passage Retrieval and Zero-Shot Question Generation

Stalin Varanasi; Muhammad Umer Tariq Butt; Günter Neumann

Auto-Encoding Questions with Retrieval Augmented Decoding for Unsupervised Passage Retrieval and Zero-Shot Question Generation

Stalin Varanasi, Muhammad Umer Tariq Butt, Guenter Neumann

Abstract

Dense passage retrieval models have become state-of-the-art for information retrieval on many Open-domain Question Answering (ODQA) datasets. However, most of these models rely on supervision obtained from the ODQA datasets, which hinders their performance in a low-resource setting. Recently, retrieval-augmented language models have been proposed to improve both zero-shot and supervised information retrieval. However, these models have pre-training tasks that are agnostic to the target task of passage retrieval. In this work, we propose Retrieval Augmented Auto-encoding of Questions for zero-shot dense information retrieval. Unlike other pre-training methods, our pre-training method is built for target information retrieval, thereby making the pre-training more efficient. Our method consists of a dense IR model for encoding questions and retrieving documents during training and a conditional language model that maximizes the question’s likelihood by marginalizing over retrieved documents. As a by-product, we can use this conditional language model for zero-shot question generation from documents. We show that the IR model obtained through our method improves the current state-of-the-art of zero-shot dense information retrieval, and we improve the results even further by training on a synthetic corpus created by zero-shot question generation.

Anthology ID:: 2023.ranlp-1.124
Volume:: Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:: September
Year:: 2023
Address:: Varna, Bulgaria
Editors:: Ruslan Mitkov, Galia Angelova
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 1171–1179
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2023.ranlp-1.124/
DOI:
Bibkey:
Cite (ACL):: Stalin Varanasi, Muhammad Umer Tariq Butt, and Guenter Neumann. 2023. Auto-Encoding Questions with Retrieval Augmented Decoding for Unsupervised Passage Retrieval and Zero-Shot Question Generation. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 1171–1179, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: Auto-Encoding Questions with Retrieval Augmented Decoding for Unsupervised Passage Retrieval and Zero-Shot Question Generation (Varanasi et al., RANLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2023.ranlp-1.124.pdf

PDF Cite Search Fix data