2025
pdf
bib
abs
Detoxifying Large Language Models via the Diversity of Toxic Samples
Ying Zhao
|
Yuanzhao Guo
|
Xuemeng Weng
|
Yuan Tian
|
Wei Wang
|
Yi Chang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Eliminating toxicity from Large Language Models (LLMs) is crucial for ensuring user safety. However, current methods have limitations in the analysis and utilization of toxic samples, failing to fully harness their potential. Through comparative analysis of toxic and safe samples, we discover that toxic samples exhibit diversity and, within this diversity, there lies specificity. These findings suggest that leveraging these characteristics of toxic samples could enhance the performance of algorithms in detoxifying LLMs. To this end, we propose a novel diverse detoxification framework, DivDetox, which comprises two innovative components: a Multi-Category-Induced Personalized Sample Generation (MPSG) strategy and a Scaled Contrastive DPO (SC-DPO) approach. The former is designed to elicit a variety of personalized toxic responses from the LLM, while the latter is constructed to precisely and fully utilize these toxic responses. Experiments on benchmark datasets across different model scales and different detoxification tasks verify the effectiveness of our architecture.
pdf
bib
abs
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Yuxin Jiang
|
Yufei Wang
|
Chuhan Wu
|
Xinyi Dai
|
Yan Xu
|
Weinan Gan
|
Yasheng Wang
|
Xin Jiang
|
Lifeng Shang
|
Ruiming Tang
|
Wei Wang
Findings of the Association for Computational Linguistics: ACL 2025
The improvement of LLMs’ instruction-following capabilities depends critically on the availability of high-quality instruction-response pairs. While existing automatic data synthetic methods alleviate the burden of manual curation, they often rely heavily on either the quality of seed data or strong assumptions about the structure and content of web documents. To tackle these challenges, we propose Web Reconstruction (WebR), a fully automated framework for synthesizing high-quality instruction-tuning (IT) data directly from raw web documents with minimal assumptions. Leveraging the inherent diversity of raw web content, we conceptualize web reconstruction as an instruction-tuning data synthesis task via a novel dual-perspective paradigm—Web as Instruction and Web as Response—where each web document is designated as either the input or output role to trigger the reconstruction process. Comprehensive experiments show that datasets generated by WebR outperform state-of-the-art baselines by up to 16.65% across four instruction-following benchmarks. Notably, WebR demonstrates superior compatibility, data efficiency, and scalability, enabling enhanced domain adaptation with minimal effort.
pdf
bib
abs
When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning
Mengyi Deng
|
Xin Li
|
Tingyu Zhu
|
Zhicheng Yang
|
Zhijiang Guo
|
Wei Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Existing work has shown that o1-level performance can be achieved with limited data distillation, but most existing methods focus on unidirectional supervised fine-tuning (SFT), overlooking the intricate interplay between diverse reasoning patterns. In this paper, we construct r1k, a high-quality reverse reasoning dataset derived by inverting 1,000 forward examples from s1k, and examine how SFT and Direct Preference Optimization (DPO) affect alignment under bidirectional reasoning objectives. SFT on r1k yields a 1.6%–6.8% accuracy improvement over s1k across evaluated benchmarks. However, naively mixing forward and reverse data during SFT weakens the directional distinction. Although DPO can partially recover this distinction, it also suppresses less preferred reasoning paths by shifting the probability mass toward irrelevant outputs. These findings suggest that mixed reasoning data introduce conflicting supervision signals, underscoring the need for robust and direction-aware alignment strategies. Our code and data are available at: https://github.com/16demi/ReasonAlign-analysis.
pdf
bib
abs
PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning
Yaoshu Wang
|
Mengyi Yan
|
Wei Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Entity resolution is a fundamental problem in data management that aims to identify all duplicate entries within collections of multi-attribute tuples. Most existing works focus on supervised learning, relying on large amounts of high-quality labeled data, including both positive and negative tuple pairs that are meticulously prepared. However, in reality, the manual annotation process is labor-intensive; in particular, selecting high-quality negative data for labeling is both important and challenging. In this paper, we propose an end-to-end ER solution, PUER, to address low-resource entity resolution (ER) by leveraging Large Language Models (LLMs) in a Positive-Unlabeled (PU) learning setting, where only a small number of positively labeled examples, e.g., 50, and unlabeled data are provided. Unlike directly fine-tuning LLMs in a supervised manner, we solve the entity matching task using reinforcement learning and propose a self-adaptive reward function in the process of RL. To enhance performance, we design an iterative workflow based on the co-training mechanism that fully utilizes entity blocking component to assist the entity matching. This workflow aims to improve the robustness and quality of pseudo-labels so that the performance of entity matching improves. Comprehensive experimental results on various benchmark datasets demonstrate the superiority of PUER. Full version and code are available.