Ying Zhao
2025
Detoxifying Large Language Models via the Diversity of Toxic Samples
Ying Zhao
|
Yuanzhao Guo
|
Xuemeng Weng
|
Yuan Tian
|
Wei Wang
|
Yi Chang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Eliminating toxicity from Large Language Models (LLMs) is crucial for ensuring user safety. However, current methods have limitations in the analysis and utilization of toxic samples, failing to fully harness their potential. Through comparative analysis of toxic and safe samples, we discover that toxic samples exhibit diversity and, within this diversity, there lies specificity. These findings suggest that leveraging these characteristics of toxic samples could enhance the performance of algorithms in detoxifying LLMs. To this end, we propose a novel diverse detoxification framework, DivDetox, which comprises two innovative components: a Multi-Category-Induced Personalized Sample Generation (MPSG) strategy and a Scaled Contrastive DPO (SC-DPO) approach. The former is designed to elicit a variety of personalized toxic responses from the LLM, while the latter is constructed to precisely and fully utilize these toxic responses. Experiments on benchmark datasets across different model scales and different detoxification tasks verify the effectiveness of our architecture.
2005
A Classification-based Algorithm for Consistency Check of Part-of-Speech Tagging for Chinese Corpora
Hu Zhang
|
Jia-heng Zheng
|
Ying Zhao
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts
1994
Is N-Best Dead?
Long Nguyen
|
Richard Schwartz
|
Ying Zhao
|
George Zavaliagkos
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994
Search
Fix author
Co-authors
- Yi Chang 1
- Yuanzhao Guo 1
- Long Nguyen 1
- Richard Schwartz 1
- Yuan Tian 1
- show all...