Abstract
Product quantization (PQ) is a widely used technique for ad-hoc retrieval. Recent studies propose supervised PQ, where the embedding and quantization models can be jointly trained with supervised learning. However, there is a lack of appropriate formulation of the joint training objective; thus, the improvements over previous non-supervised baselines are limited in reality. In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated. With the minimization of MCL, we are able to maximize the matching probability of query and ground-truth key, which contributes to the optimal retrieval accuracy. Given that the exact computation of MCL is intractable due to the demand of vast contrastive samples, we further propose the Differentiable Cross-device Sampling (DCS), which significantly augments the contrastive samples for precise approximation of MCL. We conduct extensive experimental studies on four real-world datasets, whose results verify the effectiveness of MoPQ. The code is available at https://github.com/microsoft/MoPQ.- Anthology ID:
- 2021.emnlp-main.640
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8119–8129
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.640
- DOI:
- 10.18653/v1/2021.emnlp-main.640
- Cite (ACL):
- Shitao Xiao, Zheng Liu, Yingxia Shao, Defu Lian, and Xing Xie. 2021. Matching-oriented Embedding Quantization For Ad-hoc Retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8119–8129, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Matching-oriented Embedding Quantization For Ad-hoc Retrieval (Xiao et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/landing_page/2021.emnlp-main.640.pdf
- Code
- microsoft/mopq