Daehui Kim
2026
Exploring Iterative Controllable Summarization with Large Language Models
Sangwon Ryu | Heejin Do | Daehui Kim | Hwanjo Yu | Dongwoo Kim | Yunsu Kim | Gary Lee | Jungseul Ok
Findings of the Association for Computational Linguistics: EACL 2026
Sangwon Ryu | Heejin Do | Daehui Kim | Hwanjo Yu | Dongwoo Kim | Yunsu Kim | Gary Lee | Jungseul Ok
Findings of the Association for Computational Linguistics: EACL 2026
Large language models (LLMs) have demonstrated remarkable performance in abstractive summarization tasks. However, their ability to precisely control summary attributes (e.g., length or topic) remains underexplored, limiting their adaptability to specific user preferences. In this paper, we systematically explore the controllability of LLMs. To this end, we revisit summary attribute measurements and introduce iterative evaluation metrics, failure rate and average iteration count, to more precisely evaluate controllability beyond assessment of errors. Our findings show that LLMs struggle more with numerical attributes than with linguistic attributes. To address this challenge, we propose a guide-to-explain framework (GTE) for controllable summarization. GTE enables the model to identify misaligned attributes in the initial draft and guides it to self-explain errors in the previous output. By encouraging reflection on attribute misalignment, GTE generates well-adjusted summaries that satisfy the desired attributes with robust effectiveness while requiring surprisingly fewer iterations than other iterative approaches.
2025
KoBLEX: Open Legal Question Answering with Multi-hop Reasoning
Jihyung Lee | Daehui Kim | Seonjeong Hwang | Hyounghun Kim | Gary Lee
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Jihyung Lee | Daehui Kim | Seonjeong Hwang | Hyounghun Kim | Gary Lee
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLM) have achieved remarkable performances in general domains and are now extending into the expert domain of law. Several benchmarks have been proposed to evaluate LLMs’ legal capabilities. However, these benchmarks fail to evaluate open-ended and provision-grounded Question Answering (QA). To address this, we introduce a Korean Benchmark for Legal EXplainable QA (KoBLEX), designed to evaluate provision-grounded, multi-hop legal reasoning. KoBLEX includes 226 scenario-based QA instances and their supporting provisions, created using a hybrid LLM–human expert pipeline. We also propose a method called Parametric provision-guided Selection Retrieval (ParSeR), which uses LLM-generated parametric provisions to guide legally grounded and reliable answers. ParSeR facilitates multi-hop reasoning on complex legal questions by generating parametric provisions and employing a three-stage sequential retrieval process. Furthermore, to better evaluate the legal fidelity of the generated answers, we propose Legal Fidelity Evaluation (LF-Eval). LF-Eval is an automatic metric that jointly considers the question, answer, and supporting provisions and shows a high correlation with human judgments. Experimental results show that ParSeR consistently outperforms strong baselines, achieving the best results across multiple LLMs. Notably, compared to standard retrieval with GPT-4o, ParSeR achieves +37.91 higher F1 and +30.81 higher LF-Eval. Further analyses reveal that ParSeR efficiently delivers consistent performance across reasoning depths, with ablations confirming the effectiveness of ParSeR.
GuRE:Generative Query REwriter for Legal Passage Retrieval
Daehui Kim | Deokhyung Kang | Jonghwi Kim | Sangwon Ryu | Gary Lee
Proceedings of the Natural Legal Language Processing Workshop 2025
Daehui Kim | Deokhyung Kang | Jonghwi Kim | Sangwon Ryu | Gary Lee
Proceedings of the Natural Legal Language Processing Workshop 2025
Legal Passage Retrieval (LPR) systems are crucial as they help practitioners save time when drafting legal arguments. However, it remains an underexplored avenue. One primary reason is the significant vocabulary mismatch between the query and the target passage. To address this, we propose a simple yet effective method, the Generative query REwriter (GuRE). We leverage the generative capabilities of Large Language Models (LLMs) by training the LLM for query rewriting. "Rewritten queries" help retrievers to retrieve target passages by mitigating vocabulary mismatch. Experimental results show that GuRE significantly improves performance in a retriever-agnostic manner, outperforming all baseline methods. Further analysis reveals that different training objectives lead to distinct retrieval behaviors, making GuRE more suitable than direct retriever fine-tuning for real-world applications. Codes are avaiable at github.com/daehuikim/GuRE.