Xinghao Yang
2025
Disentangled Information Bottleneck for Adversarial Text Defense
Yidan Xu
|
Xinghao Yang
|
Wei Liu
|
Bao-di Liu
|
Weifeng Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Adversarial text defense is a significant strategy to protect modern NLP models from being attacked. Typical text defense methods usually enhance the model’s robustness by model retraining or equipping it with a data preprocessing step, aiming to eliminate the non-robust features and preserve the robust ones. Although some efforts have been made to recognize the robust features, e.g., by the information bottleneck (IB) technique, how to fully disentangle the robust and non-robust representation remains a big challenge. To alleviate this problem, we propose a novel text defense method, named Disentangled Information Bottleneck (DisIB), with two major merits. Firstly, we separate the robust features and non-robust features with a disentangled two-line framework rather than the one-line compression network in IB. This prevents the loss of robust features caused by information compression and produces complete robust features. Secondly, we design a discriminator network to approximate the minimum mutual information of the two lines, which sufficiently disentangles robust and non-robust features. To validate the effectiveness of our DisIB, we conduct a total of 96 defense experiments on four datasets by defending four popular attack methods. Experimental results elaborate that our method significantly outperforms six baselines, with accuracy improvements ranging from 3.8% to 20.7%.
2024
Adaptive Immune-based Sound-Shape Code Substitution for Adversarial Chinese Text Attacks
Ao Wang
|
Xinghao Yang
|
Chen Li
|
Bao-di Liu
|
Weifeng Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Adversarial textual examples reveal the vulnerability of natural language processing (NLP) models. Most existing text attack methods are designed for English text, while the robust implementation of the second popular language, i.e., Chinese with 1 billion users, is greatly underestimated. Although several Chinese attack methods have been presented, they either directly transfer from English attacks or adopt simple greedy search to optimize the attack priority, usually leading to unnatural sentences. To address these issues, we propose an adaptive Immune-based Sound-Shape Code (ISSC) algorithm for adversarial Chinese text attacks. Firstly, we leverage the Sound-Shape code to generate natural substitutions, which comprehensively integrate multiple Chinese features. Secondly, we employ adaptive immune algorithm (IA) to determine the replacement order, which can reduce the duplication of population to improve the search ability. Extensive experimental results validate the superiority of our ISSC in producing high-quality Chinese adversarial texts. Our code and data can be found in https://github.com/nohuma/chinese-attack-issc.
Search
Fix author
Co-authors
- Bao-di Liu 2
- Weifeng Liu 2
- Chen Li (李辰) 1
- Wei Liu 1
- Ao Wang 1
- show all...
- Yidan Xu 1