Jingun Kwon
Other people with similar names: Hidetaka Kamigaito , Manabu Okumura
2025
Considering Length Diversity in Retrieval-Augmented Summarization
Juseon-Do
|
Jaesung Hwang
|
Jingun Kwon
|
Hidetaka Kamigaito
|
Manabu Okumura
Findings of the Association for Computational Linguistics: NAACL 2025
This study investigates retrieval-augmented summarization by specifically examining the impact of exemplar summary lengths because previous methods have not considered length constraints. We propose a Diverse Length-aware Maximal Marginal Relevance (DL-MMR) algorithm to better control summary lengths. This algorithm combines the query relevance with diverse target lengths in retrieval-augmented summarization. Unlike previous methods that necessitate exhaustive exemplar-exemplar relevance comparisons using MMR, DL-MMR considers the exemplar target length as well and avoids comparing exemplars to each other, thereby reducing computational cost and conserving memory during the construction of an exemplar pool. Experimental results showed the effectiveness of DL-MMR, which considers length diversity, compared to the original MMR algorithm. DL-MMR additionally showed the effectiveness in memory saving of 781,513 times and computational cost reduction of 500,092 times, while maintaining the same level of informativeness.
gMBA: Expression Semantic Guided Mixed Boolean-Arithmetic Deobfuscation Using Transformer Architectures
Youjeong Roh
|
Joon-Young Paik
|
Jingun Kwon
|
Eun-Sun Cho
Findings of the Association for Computational Linguistics: ACL 2025
Mixed Boolean-Arithmetic (MBA) obfuscation protects intellectual property by converting programs into forms that are more complex to analyze. However, MBA has been increasingly exploited by malware developers to evade detection and cause significant real-world problems. Traditional MBA deobfuscation methods often consider these expressions as part of a black box and overlook their internal semantic information. To bridge this gap, we propose a truth table, which is an automatically constructed semantic representation of an expression’s behavior that does not rely on external resources. The truth table is a mathematical form that represents the output of expression for all possible combinations of input. We also propose a general and extensible guided MBA deobfuscation framework (gMBA) that modifies a Transformer-based neural encoder-decoder Seq2Seq architecture to incorporate this semantic guidance. Experimental results and in-depth analysis show that integrating expression semantics significantly improves performance and highlights the importance of internal semantic expressions in recovering obfuscated code to its original form.
Length Representations in Large Language Models
Sangjun Moon
|
Dasom Choi
|
Jingun Kwon
|
Hidetaka Kamigaito
|
Manabu Okumura
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models (LLMs) have shown remarkable capabilities across various tasks, that are learned from massive amounts of text-based data. Although LLMs can control output sequence length, particularly in instruction-based settings, the internal mechanisms behind this control have been unexplored yet. In this study, we provide empirical evidence on how output sequence length information is encoded within the internal representations in LLMs. In particular, our findings show that multi-head attention mechanisms are critical in determining output sequence length, which can be adjusted in a disentangled manner. By scaling specific hidden units within the model, we can control the output sequence length without losing the informativeness of the generated text, thereby indicating that length information is partially disentangled from semantic information. Moreover, some hidden units become increasingly active as prompts become more length-specific, thus reflecting the model’s internal awareness of this attribute. Our findings suggest that LLMs have learned robust and adaptable internal mechanisms for controlling output length without any external control.
Search
Fix author
Co-authors
- Hidetaka Kamigaito 2
- Manabu Okumura 2
- Eun-Sun Cho 1
- Dasom Choi 1
- Jaesung Hwang 1
- show all...