Wonbin Kweon


2025

pdf bib
Topic Coverage-based Demonstration Retrieval for In-Context Learning
Wonbin Kweon | SeongKu Kang | Runchu Tian | Pengcheng Jiang | Jiawei Han | Hwanjo Yu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The effectiveness of in-context learning relies heavily on selecting demonstrations that provide all the necessary information for a given test input.To achieve this, it is crucial to identify and cover fine-grained knowledge requirements. However, prior methods often retrieve demonstrations based solely on embedding similarity or generation probability, resulting in irrelevant or redundant examples.In this paper, we propose TopicK, a topic coverage-based retrieval framework that selects demonstrations to comprehensively cover topic-level knowledge relevant to both the test input and the model.Specifically, TopicK estimates the topics required by the input and assesses the model’s knowledge on those topics.TopicK then iteratively selects demonstrations that introduce previously uncovered required topics, in which the model exhibits low topical knowledge.We validate the effectiveness of TopicK through extensive experiments across various datasets and both open- and closed-source LLMs.Our source code is available at https://github.com/WonbinKweon/TopicK_EMNLP2025.

pdf bib
Verbosity-Aware Rationale Reduction: Sentence-Level Rationale Reduction for Efficient and Effective Reasoning
Joonwon Jang | Jaehee Kim | Wonbin Kweon | Seonghyeon Lee | Hwanjo Yu
Findings of the Association for Computational Linguistics: ACL 2025

Large Language Models (LLMs) rely on generating extensive intermediate reasoning units (e.g., tokens, sentences) to enhance final answer quality across a wide range of complex tasks. While this approach has proven effective, it inevitably increases substantial inference costs. Previous methods adopting token-level reduction without clear criteria result in poor performance compared to models trained with complete rationale. To address this challenge, we propose a novel sentence-level rationale reduction framework leveraging likelihood-based criteria, *verbosity*, to identify and remove redundant reasoning sentences. Unlike previous approaches, our method leverages *verbosity* to selectively remove redundant reasoning sentences while preserving reasoning capabilities. Our experimental results across various reasoning tasks demonstrate that our method improves performance by an average of 7.71% while reducing token generation by 19.87% compared to model trained with complete reasoning paths.

2024

pdf bib
Rectifying Demonstration Shortcut in In-Context Learning
Joonwon Jang | Sanghwan Jang | Wonbin Kweon | Minjin Jeon | Hwanjo Yu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large language models (LLMs) are able to solve various tasks with only a few demonstrations utilizing their in-context learning (ICL) abilities.However, LLMs often rely on their pre-trained semantic priors of demonstrations rather than on the input-label relationships to proceed with ICL prediction. In this work, we term this phenomenon as the ‘Demonstration Shortcut’.While previous works have primarily focused on improving ICL prediction results for predefined tasks, we aim to rectify the Demonstration Shortcut, thereby enabling the LLM to effectively learn new input-label relationships from demonstrations.To achieve this, we introduce In-Context Calibration, a demonstration-aware calibration method.We evaluate the effectiveness of the proposed method in two settings: (1) the Original ICL Task using the standard label space and (2) the Task Learning setting, where the label space is replaced with semantically unrelated tokens.In both settings, In-Context Calibration demonstrates substantial improvements, with results generalized across three LLM families (OPT, GPT, and Llama2) under various configurations.