SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data

Suyoung Bae; YunSeok Choi; Hyojun Kim; Jee-Hyong Lee

SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data

Suyoung Bae, YunSeok Choi, Hyojun Kim, Jee-Hyong Lee

Abstract

In various natural language processing (NLP) tasks, fine-tuning Pre-trained Language Models (PLMs) often leads to the issue of spurious correlations, which negatively impacts performance, particularly when dealing with out-of-distribution data.To address this problem, we propose **SALAD** (**S**tructure **A**ware and **L**LM-driven **A**ugmented **D**ata), a novel approach designed to enhance model robustness and generalization by generating structure-aware and counterfactually augmented data for contrastive learning.Our method leverages a tagging-based approach to generate structure-aware positive samples and utilizes large language models (LLMs) to generate counterfactual negative samples with diverse sentence patterns. By applying contrastive learning, *SALAD* enables the model to focus on learning the structural relationships between key sentence components while minimizing reliance on spurious correlations.We validate our approach through experiments on three tasks: Sentiment Classification, Sexism Detection, and Natural Language Inference. The results demonstrate that *SALAD* not only improves model robustness and performance across different environments but also enhances generalization to out-of-distribution datasets and cross-domain scenarios.

Anthology ID:: 2025.naacl-long.634
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12724–12738
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.naacl-long.634/
DOI:
Bibkey:
Cite (ACL):: Suyoung Bae, YunSeok Choi, Hyojun Kim, and Jee-Hyong Lee. 2025. SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 12724–12738, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data (Bae et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.naacl-long.634.pdf

PDF Cite Search Fix data