Hyun-Je Song
2026
Selective Span-Level Unlearning for Large Language Models
Chaewon Yoon | Dongjun Kim | Hyun-Je Song
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Chaewon Yoon | Dongjun Kim | Hyun-Je Song
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Large language models (LLMs) trained on massive text corpora may inadvertently memorize sensitive or copyrighted content, motivating the need for more targeted unlearning. Selective LLM unlearning focuses on identifying token-level or span-level unlearning targets within a text, rather than treating entire sequences as unlearning targets. However, many existing selective approaches depend on external supervision to identify unlearning targets, which may misalign unlearning objectives with the model’s internal behavior. In this paper, we propose a selective span-level unlearning method that is grounded entirely in model-intrinsic information. Our method first estimates token-level importance scores by contrasting gradient information induced by forget and retain datasets, identifying tokens that disproportionately contribute to information targeted for unlearning. These token-level importance scores are then used as anchors to identify coherent span-level unlearning targets via a self-consistency–based generation process, allowing the model to determine stable spans based on its own predictions. Experiments on two LLM unlearning benchmarks show that our approach achieves comparable unlearning performance while substantially better preserving retained knowledge.
2023
Improving Multi-Stage Long Document Summarization with Enhanced Coarse Summarizer
Jinhyeong Lim | Hyun-Je Song
Proceedings of the 4th New Frontiers in Summarization Workshop
Jinhyeong Lim | Hyun-Je Song
Proceedings of the 4th New Frontiers in Summarization Workshop
Multi-stage long document summarization, which splits a long document as multiple segments and each of which is used to generate a coarse summary in multiple stage, and then the final summary is produced using the last coarse summary, is a flexible approach to capture salient information from the long document. Even if the coarse summary affects the final summary, however, the coarse summarizer in the existing multi-stage summarization is coarsely trained using data segments that are not useful to generate the final summary. In this paper, we propose a novel method for multi-stage long document summarization. The proposed method first generates new segment pairs, ensuring that all of them are relevant to generating the final summary. We then incorporate contrastive learning into the training of the coarse summarizer, which tries to maximize the similarities between source segments and the target summary during training. Through extensive experiments on six long document summarization datasets, we demonstrate that our proposed method not only enhances the existing multi-stage long document summarization approach, but also achieves performance comparable to state-of-the-art methods, including those utilizing large language models for long document summarization.
2019
Korean Morphological Analysis with Tied Sequence-to-Sequence Multi-Task Model
Hyun-Je Song | Seong-Bae Park
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Hyun-Je Song | Seong-Bae Park
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Korean morphological analysis has been considered as a sequence of morpheme processing and POS tagging. Thus, a pipeline model of the tasks has been adopted widely by previous studies. However, the model has a problem that it cannot utilize interactions among the tasks. This paper formulates Korean morphological analysis as a combination of the tasks and presents a tied sequence-to-sequence multi-task model for training the two tasks simultaneously without any explicit regularization. The experiments prove the proposed model achieves the state-of-the-art performance.
2016
A Translation-Based Knowledge Graph Embedding Preserving Logical Property of Relations
Hee-Geun Yoon | Hyun-Je Song | Seong-Bae Park | Se-Young Park
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hee-Geun Yoon | Hyun-Je Song | Seong-Bae Park | Se-Young Park
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2014
Device-Dependent Readability for Improved Text Understanding
A-Yeong Kim | Hyun-Je Song | Seong-Bae Park | Sang-Jo Lee
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
A-Yeong Kim | Hyun-Je Song | Seong-Bae Park | Sang-Jo Lee
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2013
A Just-In-Time Keyword Extraction from Meeting Transcripts
Hyun-Je Song | Junho Go | Seong-Bae Park | Se-Young Park
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hyun-Je Song | Junho Go | Seong-Bae Park | Se-Young Park
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies