Ying Zhao


2025

pdf bib
Detoxifying Large Language Models via the Diversity of Toxic Samples
Ying Zhao | Yuanzhao Guo | Xuemeng Weng | Yuan Tian | Wei Wang | Yi Chang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Eliminating toxicity from Large Language Models (LLMs) is crucial for ensuring user safety. However, current methods have limitations in the analysis and utilization of toxic samples, failing to fully harness their potential. Through comparative analysis of toxic and safe samples, we discover that toxic samples exhibit diversity and, within this diversity, there lies specificity. These findings suggest that leveraging these characteristics of toxic samples could enhance the performance of algorithms in detoxifying LLMs. To this end, we propose a novel diverse detoxification framework, DivDetox, which comprises two innovative components: a Multi-Category-Induced Personalized Sample Generation (MPSG) strategy and a Scaled Contrastive DPO (SC-DPO) approach. The former is designed to elicit a variety of personalized toxic responses from the LLM, while the latter is constructed to precisely and fully utilize these toxic responses. Experiments on benchmark datasets across different model scales and different detoxification tasks verify the effectiveness of our architecture.

2005

pdf bib
A Classification-based Algorithm for Consistency Check of Part-of-Speech Tagging for Chinese Corpora
Hu Zhang | Jia-heng Zheng | Ying Zhao
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

1994

pdf bib
Is N-Best Dead?
Long Nguyen | Richard Schwartz | Ying Zhao | George Zavaliagkos
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994