Dense Passage Retrieval: Is it Retrieving?

Benjamin Reichman, Larry Heck


Abstract
Large Language Models (LLMs) internally store repositories of knowledge. However, their access to this repository is imprecise and they frequently hallucinate information that is not true or does not exist. A paradigm called Retrieval Augmented Generation (RAG) promises to fix these issues. Dense passage retrieval (DPR) is the first step in this paradigm. In this paper, we analyze the role of DPR fine-tuning and how it affects the model being trained. DPR fine-tunes pre-trained networks to enhance the alignment of the embeddings between queries and relevant textual data. We explore DPR-trained models mechanistically by using a combination of probing, layer activation analysis, and model editing. Our experiments show that DPR training decentralizes how knowledge is stored in the network, creating multiple access pathways to the same information. We also uncover a limitation in this training style: the internal knowledge of the pre-trained model bounds what the retrieval model can retrieve. These findings suggest a few possible directions for dense retrieval: (1) expose the DPR training process to more knowledge so more can be decentralized, (2) inject facts as decentralized representations, (3) model and incorporate knowledge uncertainty in the retrieval process, and (4) directly map internal model knowledge to a knowledge base.
Anthology ID:
2024.findings-emnlp.791
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13540–13553
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.791/
DOI:
10.18653/v1/2024.findings-emnlp.791
Bibkey:
Cite (ACL):
Benjamin Reichman and Larry Heck. 2024. Dense Passage Retrieval: Is it Retrieving?. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 13540–13553, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Dense Passage Retrieval: Is it Retrieving? (Reichman & Heck, Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.791.pdf
Software:
 2024.findings-emnlp.791.software.zip