Huiling Peng


2025

pdf bib
Enhancing Chinese Offensive Language Detection with Homophonic Perturbation
Junqi Wu | Shujie Ji | Kang Zhong | Huiling Peng | Zhendongxiao | Xiongding Liu | Wu Wei
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Detecting offensive language in Chinese is challenging due to homophonic substitutions used to evade detection. We propose a framework to improve large language models’ robustness against such phonetic attacks. First, we construct HED-COLD, the first large-scale and systematic homophonic dataset for Chinese offensive language detection. Additionally, we design a homophone-aware pretraining strategy that learns the mappings among orthography, phonetics, and semantics between original and perturbed text. Experimental results show that our approach achieves state-of-the-art performance on both the COLD test set and the toxicity benchmark ToxiCloakCN. Notably, it achieves greater gains in domains susceptible to homophonic attacks, such as gender and regional content. These results demonstrate improved robustness and generalization against phonetic adversarial attacks.