Chuan Zhou
2025
Mitigating Spurious Correlations via Counterfactual Contrastive Learning
Fengxiang Cheng
|
Chuan Zhou
|
Xiang Li
|
Alina Leidinger
|
Haoxuan Li
|
Mingming Gong
|
Fenrong Liu
|
Robert Van Rooij
Findings of the Association for Computational Linguistics: EMNLP 2025
Identifying causal relationships rather than spurious correlations between words and class labels plays a crucial role in building robust text classifiers. Previous studies proposed using causal effects to distinguish words that are causally related to the sentiment, and then building robust text classifiers using words with high causal effects. However, we find that when a sentence has multiple causally related words simultaneously, the magnitude of causal effects will be significantly reduced, which limits the applicability of previous causal effect-based methods in distinguishing causally related words from spuriously correlated ones. To fill this gap, in this paper, we introduce both the probability of necessity (PN) and probability of sufficiency (PS), aiming to answer the counterfactual question that ‘if a sentence has a certain sentiment in the presence/absence of a word, would the sentiment change in the absence/presence of that word?’. Specifically, we first derive the identifiability of PN and PS under different sentiment monotonicities, and calibrate the estimation of PN and PS via the estimated average treatment effect. Finally, the robust text classifier is built by identifying the words with larger PN and PS as causally related words, and other words as spuriously correlated words, based on a contrastive learning approach name CPNS is proposed to achieve robust sentiment classification. Extensive experiments are conducted on public datasets to validate the effectiveness of our method.
2024
Phased Instruction Fine-Tuning for Large Language Models
Wei Pang
|
Chuan Zhou
|
Xiao-Hua Zhou
|
Xiaojie Wang
Findings of the Association for Computational Linguistics: ACL 2024
Instruction Fine-Tuning, a method enhancing pre-trained language models’ capabilities from mere next-word prediction to complex instruction following, often employs a one-off training approach on diverse instruction dataset. However, this method may not effectively enhance models’ adherence to instructions due to the simultaneous handling of varying instruction complexities. To address this, we propose a novel phased instruction fine-tuning (Phased IFT) method, grounded in the hypothesis of progressive alignment, which posits that the transition of a pre-trained language model from simple next-word prediction to sophisticated instruction following is a gradual learning process. Specifically, we obtain the score of difficulty for each instruction via GPT-4, stratify the instruction data into subsets of increasing difficulty, and sequentially uptrain on these subsets using the standard supervised loss. Through extensive experiments on the pre-trained models Llama-2 7B/13B, and Mistral-7B using the 52K Alpaca instruction data, we demonstrate that Phased IFT significantly surpasses traditional one-off instruction fine-tuning (One-off IFT) method in win rate, empirically validating the progressive alignment hypothesis. Our findings suggest that Phased IFT offers a simple yet effective pathway for elevating the instruction-following capabilities of pre-trained language models.
Search
Fix author
Co-authors
- Fengxiang Cheng 1
- Mingming Gong 1
- Alina Leidinger 1
- Xiang Li (李翔) 1
- Haoxuan Li 1
- show all...