Yisheng Xiao
2024
Exploring and Mitigating Shortcut Learning for Generative Large Language Models
Zechen Sun
|
Yisheng Xiao
|
Juntao Li
|
Yixin Ji
|
Wenliang Chen
|
Min Zhang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Recent generative large language models (LLMs) have exhibited incredible instruction-following capabilities while keeping strong task completion ability, even without task-specific fine-tuning. Some works attribute this to the bonus of the new scaling law, in which the continuous improvement of model capacity yields emergent capabilities, e.g., reasoning and universal generalization. However, we point out that recent LLMs still show shortcut learning behavior, where the models tend to exploit spurious correlations between non-robust features and labels for prediction, which might lead to overestimating model capabilities. LLMs memorize more complex spurious correlations (i.e., task ↔ feature ↔ label) compared with that learned from previous pre-training and task-specific fine-tuning paradigm (i.e., feature ↔ label). Based on our findings, we propose FSLI, a framework for encouraging LLMs to Forget Spurious correlations and Learn from In-context information. Experiments on three tasks show that FSFI can effectively mitigate shortcut learning. Besides, we argue not to overestimate the capabilities of LLMs and conduct evaluations in more challenging and complete test scenarios.
2023
Isotropy-Enhanced Conditional Masked Language Models
Pei Guo
|
Yisheng Xiao
|
Juntao Li
|
Yixin Ji
|
Min Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023
Non-autoregressive models have been widely used for various text generation tasks to accelerate the inference process but at the cost of generation quality to some extent. To achieve a good balance between inference speedup and generation quality, iterative NAR models like CMLM and Disco are proposed. Researchers have made much follow-up progress based on them, and some recent iterative models can achieve very promising performance while maintaining significant speedup. In this paper, we give more insights into iterative NAR models by exploring the anisotropic problem, i.e., the representations of distinct predicted target tokens are similar and indiscriminative. Upon the confirmation of the anisotropic problem in iterative NAR models, we first analyze the effectiveness of the contrastive learning method and further propose the Look Neighbors strategy to enhance the learning of token representations during training. Experiments on 4 WMT datasets show that our methods consistently improve the performance as well as alleviate the anisotropic problem of the conditional masked language model, even outperforming the current SoTA result on WMT14 EN → DE.
Search
Co-authors
- Juntao Li 2
- Yixin Ji 2
- Min Zhang 2
- Zechen Sun 1
- Wenliang Chen 1
- show all...
- Pei Guo 1