Jahyun Koo


2025

pdf bib
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo | Yerin Hwang | Yongil Kim | Taegwan Kang | Hyunkyung Bae | Kyomin Jung
Findings of the Association for Computational Linguistics: NAACL 2025

Despite the success of Large Language Models (LLMs), they still face challenges related to high inference costs and memory requirements. To address these issues, Knowledge Distillation (KD) has emerged as a popular method for model compression, with the use of student-generated outputs (SGOs) as training data being particularly notable for reducing the mismatch between training and inference. However, SGOs often produce noisy and biased sequences, which can lead to misguidance from the teacher model, especially in long sequences. To mitigate these challenges, we propose SWITCH (Studying With Teacher for Knowledge Distillation), a novel approach that strategically incorporates the teacher model during the student’s sequence generation. SWITCH identifies discrepancies between the token probabilities of the teacher and student models, allowing the teacher to intervene selectively, particularly in long sequences that are more prone to teacher misguidance. Extensive experimental results across three model families and five instruction-following datasets show that SWITCH surpasses traditional KD methods, particularly excelling in the generation of long sequential data.

2024

pdf bib
LifeTox: Unveiling Implicit Toxicity in Life Advice
Minbeom Kim | Jahyun Koo | Hwanhee Lee | Joonsuk Park | Hwaran Lee | Kyomin Jung
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

As large language models become increasingly integrated into daily life, detecting implicit toxicity across diverse contexts is crucial. To this end, we introduce LifeTox, a dataset designed for identifying implicit toxicity within a broad range of advice-seeking scenarios. Unlike existing safety datasets, LifeTox comprises diverse contexts derived from personal experiences through open-ended questions. Our experiments demonstrate that RoBERTa fine-tuned on LifeTox matches or surpasses the zero-shot performance of large language models in toxicity classification tasks. These results underscore the efficacy of LifeTox in addressing the complex challenges inherent in implicit toxicity. We open-sourced the dataset and the LifeTox moderator family; 350M, 7B, and 13B.