InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings
Xing Wu, Chaochen Gao, Zijia Lin, Jizhong Han, Zhongyuan Wang, Songlin Hu
Abstract
Contrastive learning has been extensively studied in sentence embedding learning, which assumes that the embeddings of different views of the same sentence are closer. The constraint brought by this assumption is weak, and a good sentence representation should also be able to reconstruct the original sentence fragments. Therefore, this paper proposes an information-aggregated contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE.InfoCSE forces the representation of [CLS] positions to aggregate denser sentence information by introducing an additional Masked language model task and a well-designed network. We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large, achieving state-of-the-art results among unsupervised sentence representation learning methods.- Anthology ID:
- 2022.findings-emnlp.223
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2022
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3060–3070
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2022.findings-emnlp.223/
- DOI:
- 10.18653/v1/2022.findings-emnlp.223
- Cite (ACL):
- Xing Wu, Chaochen Gao, Zijia Lin, Jizhong Han, Zhongyuan Wang, and Songlin Hu. 2022. InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3060–3070, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings (Wu et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2022.findings-emnlp.223.pdf