Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites

Xintong Wang; Yixiao Liu; Jingheng Pan; Liang Ding; Longyue Wang; Chris Biemann

Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites

Xintong Wang, Yixiao Liu, Jingheng Pan, Liang Ding, Longyue Wang, Chris Biemann

Abstract

Detoxifying offensive language while preserving the speaker’s original intent is a challenging yet critical goal for improving the quality of online interactions. Although large language models (LLMs) show promise in rewriting toxic content, they often default to overly polite rewrites, distorting the emotional tone and communicative intent. This problem is especially acute in Chinese, where toxicity often arises implicitly through emojis, homophones, or discourse context. We present ToxiRewriteCN, the first Chinese detoxification dataset explicitly designed to preserve sentiment polarity. The dataset comprises 1,556 carefully annotated triplets, each containing a toxic sentence, a sentiment-aligned non-toxic rewrite, and labeled toxic spans. It covers five real-world scenarios: standard expressions, emoji-induced and homophonic toxicity, as well as single-turn and multi-turn dialogues. We evaluate 17 LLMs, including commercial and open-source models with variant architectures, across four dimensions: detoxification accuracy, fluency, content preservation, and sentiment polarity. Results show that while commercial and MoE models perform best overall, all models struggle to balance safety with emotional fidelity in more subtle or context-heavy settings such as emoji, homophone, and dialogue-based inputs. We release ToxiRewriteCN to support future research on controllable, sentiment-aware detoxification for Chinese.

Anthology ID:: 2025.emnlp-main.1808
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35683–35699
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1808/
DOI:
Bibkey:
Cite (ACL):: Xintong Wang, Yixiao Liu, Jingheng Pan, Liang Ding, Longyue Wang, and Chris Biemann. 2025. Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35683–35699, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites (Wang et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1808.pdf
Checklist:: 2025.emnlp-main.1808.checklist.pdf

PDF Cite Search Checklist Fix data