Abstract
Multi-task dense retrieval models can be used to retrieve documents from a common corpus (e.g., Wikipedia) for different open-domain question-answering (QA) tasks. However, Karpukhin et al. (2020) shows that jointly learning different QA tasks with one dense model is not always beneficial due to corpus inconsistency. For example, SQuAD only focuses on a small set of Wikipedia articles while datasets like NQ and Trivia cover more entries, and joint training on their union can cause performance degradation. To solve this problem, we propose to train individual dense passage retrievers (DPR) for different tasks and aggregate their predictions during test time, where we use uncertainty estimation as weights to indicate how probable a specific query belongs to each expert’s expertise. Our method reaches state-of-the-art performance on 5 benchmark QA datasets, with up to 10% improvement in top-100 accuracy compared to a joint-training multi-task DPR on SQuAD. We also show that our method handles corpus inconsistency better than the joint-training DPR on a mixed subset of different QA datasets. Code and data are available at https://github.com/alexlimh/DPR_MUF.- Anthology ID:
- 2021.findings-emnlp.26
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 274–287
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.26
- DOI:
- 10.18653/v1/2021.findings-emnlp.26
- Cite (ACL):
- Minghan Li, Ming Li, Kun Xiong, and Jimmy Lin. 2021. Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 274–287, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering (Li et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2021.findings-emnlp.26.pdf
- Code
- alexlimh/DPR_MUF
- Data
- Natural Questions, SQuAD, TriviaQA