Ranking and Sampling in Open-Domain Question Answering

Yanfu Xu, Zheng Lin, Yuanxin Liu, Rui Liu, Weiping Wang, Dan Meng


Abstract
Open-domain question answering (OpenQA) aims to answer questions based on a number of unlabeled paragraphs. Existing approaches always follow the distantly supervised setup where some of the paragraphs are wrong-labeled (noisy), and mainly utilize the paragraph-question relevance to denoise. However, the paragraph-paragraph relevance, which may aggregate the evidence among relevant paragraphs, can also be utilized to discover more useful paragraphs. Moreover, current approaches mainly focus on the positive paragraphs which are known to contain the answer during training. This will affect the generalization ability of the model and make it be disturbed by the similar but irrelevant (distracting) paragraphs during testing. In this paper, we first introduce a ranking model leveraging the paragraph-question and the paragraph-paragraph relevance to compute a confidence score for each paragraph. Furthermore, based on the scores, we design a modified weighted sampling strategy for training to mitigate the influence of the noisy and distracting paragraphs. Experiments on three public datasets (Quasar-T, SearchQA and TriviaQA) show that our model advances the state of the art.
Anthology ID:
D19-1245
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2412–2421
Language:
URL:
https://aclanthology.org/D19-1245
DOI:
10.18653/v1/D19-1245
Bibkey:
Cite (ACL):
Yanfu Xu, Zheng Lin, Yuanxin Liu, Rui Liu, Weiping Wang, and Dan Meng. 2019. Ranking and Sampling in Open-Domain Question Answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2412–2421, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Ranking and Sampling in Open-Domain Question Answering (Xu et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/D19-1245.pdf