QuadrupletBERT: An Efficient Model For Embedding-Based Large-Scale Retrieval

Peiyang Liu, Sen Wang, Xi Wang, Wei Ye, Shikun Zhang


Abstract
The embedding-based large-scale query-document retrieval problem is a hot topic in the information retrieval (IR) field. Considering that pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, we present a QuadrupletBERT model for effective and efficient retrieval in this paper. Unlike most existing BERT-style retrieval models, which only focus on the ranking phase in retrieval systems, our model makes considerable improvements to the retrieval phase and leverages the distances between simple negative and hard negative instances to obtaining better embeddings. Experimental results demonstrate that our QuadrupletBERT achieves state-of-the-art results in embedding-based large-scale retrieval tasks.
Anthology ID:
2021.naacl-main.292
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3734–3739
Language:
URL:
https://aclanthology.org/2021.naacl-main.292
DOI:
10.18653/v1/2021.naacl-main.292
Bibkey:
Cite (ACL):
Peiyang Liu, Sen Wang, Xi Wang, Wei Ye, and Shikun Zhang. 2021. QuadrupletBERT: An Efficient Model For Embedding-Based Large-Scale Retrieval. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3734–3739, Online. Association for Computational Linguistics.
Cite (Informal):
QuadrupletBERT: An Efficient Model For Embedding-Based Large-Scale Retrieval (Liu et al., NAACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.naacl-main.292.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2021.naacl-main.292.mp4
Data
ReQA