AugCSE: Contrastive Sentence Embedding with Diverse Augmentations

Zilu Tang, Muhammed Yusuf Kocyigit, Derry Tanti Wijaya


Abstract
Data augmentation techniques have been proven useful in many applications in NLP fields. Most augmentations are task-specific, and cannot be used as a general-purpose tool. In our work, we present AugCSE, a unified framework to utilize diverse sets of data augmentations to achieve a better, general-purpose, sentence embedding model. Building upon the latest sentence embedding models, our approach uses a simple antagonistic discriminator that differentiates the augmentation types. With the finetuning objective borrowed from domain adaptation, we show that diverse augmentations, which often lead to conflicting contrastive signals, can be tamed to produce a better and more robust sentence representation. Our methods achieve state-of-the-art results on downstream transfer tasks and perform competitively on semantic textual similarity tasks, using only unsupervised data.
Anthology ID:
2022.aacl-main.30
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2022
Address:
Online only
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
375–398
Language:
URL:
https://aclanthology.org/2022.aacl-main.30
DOI:
Bibkey:
Cite (ACL):
Zilu Tang, Muhammed Yusuf Kocyigit, and Derry Tanti Wijaya. 2022. AugCSE: Contrastive Sentence Embedding with Diverse Augmentations. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 375–398, Online only. Association for Computational Linguistics.
Cite (Informal):
AugCSE: Contrastive Sentence Embedding with Diverse Augmentations (Tang et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.aacl-main.30.pdf