Improving Embedding-based Large-scale Retrieval via Label Enhancement
Peiyang Liu, Xi Wang, Sen Wang, Wei Ye, Xiangyu Xi, Shikun Zhang
Abstract
Current embedding-based large-scale retrieval models are trained with 0-1 hard label that indicates whether a query is relevant to a document, ignoring rich information of the relevance degree. This paper proposes to improve embedding-based retrieval from the perspective of better characterizing the query-document relevance degree by introducing label enhancement (LE) for the first time. To generate label distribution in the retrieval scenario, we design a novel and effective supervised LE method that incorporates prior knowledge from dynamic term weighting methods into contextual embeddings. Our method significantly outperforms four competitive existing retrieval models and its counterparts equipped with two alternative LE techniques by training models with the generated label distribution as auxiliary supervision information. The superiority can be easily observed on English and Chinese large-scale retrieval tasks under both standard and cold-start settings.- Anthology ID:
- 2021.findings-emnlp.13
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 133–142
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.13
- DOI:
- 10.18653/v1/2021.findings-emnlp.13
- Cite (ACL):
- Peiyang Liu, Xi Wang, Sen Wang, Wei Ye, Xiangyu Xi, and Shikun Zhang. 2021. Improving Embedding-based Large-scale Retrieval via Label Enhancement. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 133–142, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Improving Embedding-based Large-scale Retrieval via Label Enhancement (Liu et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2021.findings-emnlp.13.pdf
- Data
- CMRC, DRCD, Natural Questions