Jae Hyeon Cho
2025
K/DA: Automated Data Generation Pipeline for Detoxifying Implicitly Offensive Language in Korean
Minkyeong Jeon
|
Hyemin Jeong
|
Yerang Kim
|
Jiyoung Kim
|
Jae Hyeon Cho
|
Byung-Jun Lee
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language detoxification involves removing toxicity from offensive language. While a neutral-toxic paired dataset provides a straightforward approach for training detoxification models, creating such datasets presents several challenges: i) the need for human annotation to build paired data, and ii) the rapid evolution of offensive terms, rendering static datasets quickly outdated. To tackle these challenges, we introduce an automated paired data generation pipeline, called K/DA. This pipeline is designed to generate offensive language with implicit offensiveness and trend-aligned slang, making the resulting dataset suitable for detoxification model training. We demonstrate that the dataset generated by K/DA exhibits high pair consistency and greater implicit offensiveness compared to existing Korean datasets, and also demonstrates applicability to other languages. Furthermore, it enables effective training of a high-performing detoxification model with simple instruction fine-tuning.
Rethinking DPO: The Role of Rejected Responses in Preference Misalignment
Jae Hyeon Cho
|
JunHyeok Oh
|
Myunsoo Kim
|
Byung-Jun Lee
Findings of the Association for Computational Linguistics: EMNLP 2025
Direct Preference Optimization (DPO) is a simple and efficient framework that has attracted substantial attention. However, it often struggles to meet its primary objectives—increasing the generation probability of chosen responses while reducing that of rejected responses—due to the dominant influence of rejected responses on the loss function. This imbalance leads to suboptimal performance in promoting preferred responses. In this work, we systematically analyze the limitations of DPO and existing algorithms designed to achieve the objectives stated above. To address these limitations, we propose Bounded-DPO (BDPO), a novel method that bounds the influence of rejected responses while maintaining the original optimization structure of DPO. Through theoretical analysis and empirical evaluations, we demonstrate that BDPO achieves a balanced optimization of the chosen and rejected responses, outperforming existing algorithms.
Search
Fix author
Co-authors
- Byung-Jun Lee 2
- Minkyeong Jeon 1
- Hyemin Jeong 1
- Yerang Kim 1
- Jiyoung Kim 1
- show all...