HiCL: Hierarchical Contrastive Learning of Unsupervised Sentence Embeddings

Zhuofeng Wu, Chaowei Xiao, VG Vinod Vydiswaran


Abstract
In this paper, we propose a hierarchical contrastive learning framework, HiCL, which considers local segment-level and global sequence-level relationships to improve training efficiency and effectiveness. Traditional methods typically encode a sequence in its entirety for contrast with others, often neglecting local representation learning, leading to challenges in generalizing to shorter texts. Conversely, HiCL improves its effectiveness by dividing the sequence into several segments and employing both local and global contrastive learning to model segment-level and sequence-level relationships. Further, considering the quadratic time complexity of transformers over input tokens, HiCL boosts training efficiency by first encoding short segments and then aggregating them to obtain the sequence representation. Extensive experiments show that HiCL enhances the prior top-performing SNCSE model across seven extensively evaluated STS tasks, with an average increase of +0.2% observed on BERTlarge and +0.44% on RoBERTalarge.
Anthology ID:
2023.findings-emnlp.161
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2461–2476
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.161
DOI:
10.18653/v1/2023.findings-emnlp.161
Bibkey:
Cite (ACL):
Zhuofeng Wu, Chaowei Xiao, and VG Vinod Vydiswaran. 2023. HiCL: Hierarchical Contrastive Learning of Unsupervised Sentence Embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2461–2476, Singapore. Association for Computational Linguistics.
Cite (Informal):
HiCL: Hierarchical Contrastive Learning of Unsupervised Sentence Embeddings (Wu et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2023.findings-emnlp.161.pdf