Xiao Huang
Other people with similar names: Xiao Huang
2023
Contrastive Learning with Adversarial Examples for Alleviating Pathology of Language Model
Pengwei Zhan
|
Jing Yang
|
Xiao Huang
|
Chunlei Jing
|
Jingying Li
|
Liming Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Neural language models have achieved superior performance. However, these models also suffer from the pathology of overconfidence in the out-of-distribution examples, potentially making the model difficult to interpret and making the interpretation methods fail to provide faithful attributions. In this paper, we explain the model pathology from the view of sentence representation and argue that the counter-intuitive bias degree and direction of the out-of-distribution examples’ representation cause the pathology. We propose a Contrastive learning regularization method using Adversarial examples for Alleviating the Pathology (ConAAP), which calibrates the sentence representation of out-of-distribution examples. ConAAP generates positive and negative examples following the attribution results and utilizes adversarial examples to introduce direction information in regularization. Experiments show that ConAAP effectively alleviates the model pathology while slightly impacting the generalization ability on in-distribution examples and thus helps interpretation methods obtain more faithful results.
Similarizing the Influence of Words with Contrastive Learning to Defend Word-level Adversarial Text Attack
Pengwei Zhan
|
Jing Yang
|
He Wang
|
Chao Zheng
|
Xiao Huang
|
Liming Wang
Findings of the Association for Computational Linguistics: ACL 2023
Neural language models are vulnerable to word-level adversarial text attacks, which generate adversarial examples by directly substituting discrete input words. Previous search methods for word-level attacks assume that the information in the important words is more influential on prediction than unimportant words. In this paper, motivated by this assumption, we propose a self-supervised regularization method for Similarizing the Influence of Words with Contrastive Learning (SIWCon) that encourages the model to learn sentence representations in which words of varying importance have a more uniform influence on prediction. Experiments show that SIWCon is compatible with various training methods and effectively improves model robustness against various unforeseen adversarial attacks. The effectiveness of SIWCon is also intuitively shown through qualitative analysis and visualization of the loss landscape, sentence representation, and changes in model confidence.
Search
Fix author
Co-authors
- Liming Wang 2
- Jing Yang 2
- Pengwei Zhan 2
- Chunlei Jing 1
- Jingying Li 1
- show all...