DSCD: Large Language Model Detoxification with Self-Constrained Decoding
Ming Dong, Jinkui Zhang, Bolong Zheng, Xinhui Tu, Po Hu, Tingting He
Abstract
Detoxification in large language models (LLMs) remains a significant research challenge. Existing decoding detoxification methods are all based on external constraints, which require additional resource overhead and lose generation fluency. This work innovatively proposes Detoxification with Self-Constrained Decoding (DSCD), a novel method for LLMs detoxification without parameter fine-tuning. DSCD strengthens the inner token distribution of the safety layer while weakening that of hallucination and toxic layer during output generation. This effectively diminishes toxicity and enhances output safety. DSCD offers lightweight, high compatibility, and plug-and-play capabilities, readily integrating with existing detoxification methods for further performance improvement. Extensive experiments on representative open-source LLMs and public datasets validate DSCD’s effectiveness, demonstrating state-of-the-art (SOTA) performance in both detoxification and generation fluency, with superior efficiency compared to existing methods. These results highlight DSCD’s potential as a practical and scalable solution for safer LLM deployments.- Anthology ID:
- 2025.emnlp-main.197
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3969–3984
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.197/
- DOI:
- Cite (ACL):
- Ming Dong, Jinkui Zhang, Bolong Zheng, Xinhui Tu, Po Hu, and Tingting He. 2025. DSCD: Large Language Model Detoxification with Self-Constrained Decoding. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3969–3984, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- DSCD: Large Language Model Detoxification with Self-Constrained Decoding (Dong et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.197.pdf