KiJung Seo

2026

ADVICE: Answer-Dependent Verbalized Confidence Estimation
KiJung Seo | Sehun Lim | Taeuk Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent progress in large language models (LLMs) has enabled them to communicate their confidence in natural language, improving transparency and reliability.However, this expressiveness is often accompanied by systematic overconfidence, whose underlying causes remain poorly understood. In this work, we analyze the dynamics of verbalized confidence estimation and identify answer-independence-the failure to condition confidence on the model’s own answer-as a primary driver of this behavior.To address this, we introduce ADVICE (Answer-Dependent VerbalIzed Confidence Estimation), a fine-tuning framework that promotes answer-grounded confidence estimation.Extensive experiments show that ADVICE substantially improves confidence calibration, while exhibiting strong generalization to unseen settings without degrading task performance.We further demonstrate that these gains stem from enhanced answer dependence, shedding light on the origins of overconfidence and enabling trustworthy confidence verbalization.

2024

pdf bib abs

Revisiting the Impact of Pursuing Modularity for Code Generation
Deokyeong Kang | KiJung Seo | Taeuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2024

Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development. However, with the rise of recent code generation agents built upon large language models (LLMs), a question emerges: is this traditional practice equally effective for these new tools? In this work, we assess the impact of modularity in code generation by introducing a novel metric for its quantitative measurement. Surprisingly, unlike conventional wisdom on the topic, we find that modularity is not a core factor for improving the performance of code generation models. We also explore potential explanations for why LLMs do not exhibit a preference for modular code compared to non-modular code.

Co-authors

Venues

ACL1
Findings1

Fix author