Paraphrase-based Contrastive Learning for Sentence Pair Modeling

Seiji Sugiyama, Risa Kondo, Tomoyuki Kajiwara, Takashi Ninomiya


Abstract
To improve the performance of sentence pair modeling tasks, we propose an additional pre-training method, also known as transfer fine-tuning, for pre-trained masked language models.Pre-training for masked language modeling is not necessarily designed to bring semantically similar sentences closer together in the embedding space.Our proposed method aims to improve the performance of sentence pair modeling by applying contrastive learning to pre-trained masked language models, in which sentence embeddings of paraphrase pairs are made similar to each other.While natural language inference corpora, which are standard in previous studies on contrastive learning, are not available on a large-scale for non-English languages, our method can construct a training corpus for contrastive learning from a raw corpus and a paraphrase dictionary at a low cost.Experimental results on four sentence pair modeling tasks revealed the effectiveness of our method in both English and Japanese.
Anthology ID:
2025.naacl-srw.39
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:
April
Year:
2025
Address:
Albuquerque, USA
Editors:
Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
400–407
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-srw.39/
DOI:
Bibkey:
Cite (ACL):
Seiji Sugiyama, Risa Kondo, Tomoyuki Kajiwara, and Takashi Ninomiya. 2025. Paraphrase-based Contrastive Learning for Sentence Pair Modeling. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 400–407, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
Paraphrase-based Contrastive Learning for Sentence Pair Modeling (Sugiyama et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-srw.39.pdf