ReQA: An Evaluation for End-to-End Answer Retrieval Models

Amin Ahmad, Noah Constant, Yinfei Yang, Daniel Cer


Abstract
Popular QA benchmarks like SQuAD have driven progress on the task of identifying answer spans within a specific passage, with models now surpassing human performance. However, retrieving relevant answers from a huge corpus of documents is still a challenging problem, and places different requirements on the model architecture. There is growing interest in developing scalable answer retrieval models trained end-to-end, bypassing the typical document retrieval step. In this paper, we introduce Retrieval Question-Answering (ReQA), a benchmark for evaluating large-scale sentence-level answer retrieval models. We establish baselines using both neural encoding models as well as classical information retrieval techniques. We release our evaluation code to encourage further work on this challenging task.
Anthology ID:
D19-5819
Volume:
Proceedings of the 2nd Workshop on Machine Reading for Question Answering
Month:
November
Year:
2019
Address:
Hong Kong, China
Venues:
EMNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
137–146
Language:
URL:
https://aclanthology.org/D19-5819
DOI:
10.18653/v1/D19-5819
Bibkey:
Cite (ACL):
Amin Ahmad, Noah Constant, Yinfei Yang, and Daniel Cer. 2019. ReQA: An Evaluation for End-to-End Answer Retrieval Models. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pages 137–146, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
ReQA: An Evaluation for End-to-End Answer Retrieval Models (Ahmad et al., EMNLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/D19-5819.pdf
Code
 google/retrieval-qa-eval
Data
ReQANatural QuestionsSQuADWikiQA