Pseudo-Relevance for Enhancing Document Representation

Jihyuk Kim; Seung-won Hwang; Seoho Song; Hyeseon Ko; Young-In Song

Pseudo-Relevance for Enhancing Document Representation

Jihyuk Kim, Seung-won Hwang, Seoho Song, Hyeseon Ko, Young-In Song

Abstract

This paper studies how to enhance the document representation for the bi-encoder approach in dense document retrieval. The bi-encoder, separately encoding a query and a document as a single vector, is favored for high efficiency in large-scale information retrieval, compared to more effective but complex architectures. To combine the strength of the two, the multi-vector representation of documents for bi-encoder, such as ColBERT preserving all token embeddings, has been widely adopted. Our contribution is to reduce the size of the multi-vector representation, without compromising the effectiveness, supervised by query logs. Our proposed solution decreases the latency and the memory footprint, up to 8- and 3-fold, validated on MSMARCO and real-world search query logs.

Anthology ID:: 2022.emnlp-main.800
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11639–11652
Language:
URL:: https://aclanthology.org/2022.emnlp-main.800
DOI:
Bibkey:
Cite (ACL):: Jihyuk Kim, Seung-won Hwang, Seoho Song, Hyeseon Ko, and Young-In Song. 2022. Pseudo-Relevance for Enhancing Document Representation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11639–11652, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Pseudo-Relevance for Enhancing Document Representation (Kim et al., EMNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2022.emnlp-main.800.pdf

PDF Search