ToxiTrace: Gradient-Aligned Training for Explainable Chinese Toxicity Detection

Boyang Li; Hongzhe Shou; Yuanyuan Liang; JingBin Zhang; Fang Zhou

ToxiTrace: Gradient-Aligned Training for Explainable Chinese Toxicity Detection

Boyang Li, Hongzhe Shou, Yuanyuan Liang, JingBin Zhang, Fang Zhou

Abstract

Existing Chinese toxic content detection methods mainly target sentence-level classification but often fail to provide readable and contiguous toxic evidence spans. We propose ToxiTrace, an explainability-oriented method for BERT-style encoders with three components: (1) CuSA, which refines encoder-derived saliency cues into fine-grained toxic spans with lightweight LLM guidance; (2) GCLoss, a gradient-constrained objective that concentrates token-level saliency on toxic evidence while suppressing irrelevant activations; and (3) ARCL, which constructs sample-specific contrastive reasoning pairs to sharpen the semantic boundary between toxic and non-toxic content. Experiments show that ToxiTrace improves classification accuracy and toxic span extraction while preserving efficient encoder-based inference and producing more coherent, human-readable explanations. The core training code is available at https://github.com/ZhouF-ECNU/ToxiTrace.

Anthology ID:: 2026.findings-acl.354
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7121–7138
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.354/
DOI:
Bibkey:
Cite (ACL):: Boyang Li, Hongzhe Shou, Yuanyuan Liang, JingBin Zhang, and Fang Zhou. 2026. ToxiTrace: Gradient-Aligned Training for Explainable Chinese Toxicity Detection. In Findings of the Association for Computational Linguistics: ACL 2026, pages 7121–7138, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ToxiTrace: Gradient-Aligned Training for Explainable Chinese Toxicity Detection (Li et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.354.pdf
Checklist:: 2026.findings-acl.354.checklist.pdf

PDF Cite Search Checklist Fix data