Abstract
Pre-trained language models (PLMs) have demonstrated their exceptional performance across a wide range of natural language processing tasks. The utilization of PLM-based sentence embeddings enables the generation of contextual representations that capture rich semantic information. However, despite their success with unseen samples, current PLM-based representations suffer from poor robustness in adversarial scenarios. In this paper, we propose RobustEmbed, a self-supervised sentence embedding framework that enhances both generalization and robustness in various text representation tasks and against diverse adversarial attacks. By generating high-risk adversarial perturbations to promote higher invariance in the embedding space and leveraging the perturbation within a novel contrastive objective approach, RobustEmbed effectively learns high-quality sentence embeddings. Our extensive experiments validate the superiority of RobustEmbed over previous state-of-the-art self-supervised representations in adversarial settings, while also showcasing relative improvements in seven semantic textual similarity (STS) tasks and six transfer tasks. Specifically, our framework achieves a significant reduction in attack success rate from 75.51% to 39.62% for the BERTAttack attack technique, along with enhancements of 1.20% and 0.40% in STS tasks and transfer tasks, respectively.- Anthology ID:
- 2023.findings-emnlp.305
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4587–4603
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.305
- DOI:
- 10.18653/v1/2023.findings-emnlp.305
- Cite (ACL):
- Javad Asl, Eduardo Blanco, and Daniel Takabi. 2023. RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4587–4603, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training (Asl et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.findings-emnlp.305.pdf