Dense Hierarchical Retrieval for Open-domain Question Answering

Ye Liu, Kazuma Hashimoto, Yingbo Zhou, Semih Yavuz, Caiming Xiong, Philip Yu


Abstract
Dense neural text retrieval has achieved promising results on open-domain Question Answering (QA), where latent representations of questions and passages are exploited for maximum inner product search in the retrieval process. However, current dense retrievers require splitting documents into short passages that usually contain local, partial and sometimes biased context, and highly depend on the splitting process. As a consequence, it may yield inaccurate and misleading hidden representations, thus deteriorating the final retrieval result. In this work, we propose Dense Hierarchical Retrieval (DHR), a hierarchical framework which can generate accurate dense representations of passages by utilizing both macroscopic semantics in the document and microscopic semantics specific to each passage. Specifically, a document-level retriever first identifies relevant documents, among which relevant passages are then retrieved by a passage-level retriever. The ranking of the retrieved passages will be further calibrated by examining the document-level relevance. In addition, hierarchical title structure and two negative sampling strategies (i.e., In-Doc and In-Sec negatives) are investigated. We apply DHR to large-scale open-domain QA datasets. DHR significantly outperforms the original dense passage retriever, and helps an end-to-end QA system outperform the strong baselines on multiple open-domain QA benchmarks.
Anthology ID:
2021.findings-emnlp.19
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
EMNLP | Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
188–200
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.19
DOI:
10.18653/v1/2021.findings-emnlp.19
Bibkey:
Cite (ACL):
Ye Liu, Kazuma Hashimoto, Yingbo Zhou, Semih Yavuz, Caiming Xiong, and Philip Yu. 2021. Dense Hierarchical Retrieval for Open-domain Question Answering. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 188–200, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Dense Hierarchical Retrieval for Open-domain Question Answering (Liu et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.findings-emnlp.19.pdf
Software:
 2021.findings-emnlp.19.Software.zip
Code
 yeliu918/dhr
Data
Natural QuestionsTriviaQAWebQuestions