Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings

Shujian Yang, Shiyao Cui, Chuanrui Hu, Haicheng Wang, Tianwei Zhang, Minlie Huang, Jialiang Lu, Han Qiu


Abstract
Detecting toxic content using language models is important but challenging. While large language models (LLMs) have demonstrated strong performance in understanding Chinese, recent studies show that simple character substitutions in toxic Chinese text can easily confuse the state-of-the-art (SOTA) LLMs. In this paper, we highlight the multimodal nature of Chinese language as a key challenge for deploying LLMs in toxic Chinese detection. First, we propose a taxonomy of 3 perturbation strategies and 8 specific approaches in toxic Chinese content. Then, we curate a dataset based on this taxonomy, and benchmark 9 SOTA LLMs (from both the US and China) to assess if they can detect perturbed toxic Chinese text. Additionally, we explore cost-effective enhancement solutions like in-context learning (ICL) and supervised fine-tuning (SFT). Our results reveal two important findings. (1) LLMs are less capable of detecting perturbed multimodal Chinese toxic contents. (2) ICL or SFT with a small number of perturbed examples may cause the LLMs “overcorrect”: misidentify many normal Chinese contents as toxic.
Anthology ID:
2025.findings-acl.742
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14382–14396
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.742/
DOI:
Bibkey:
Cite (ACL):
Shujian Yang, Shiyao Cui, Chuanrui Hu, Haicheng Wang, Tianwei Zhang, Minlie Huang, Jialiang Lu, and Han Qiu. 2025. Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14382–14396, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings (Yang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.742.pdf