Seunghyun Park
2026
Where Reasoning Breaks: Logic-Aware Path Selection by Controlling Logical Connectives in LLMs Reasoning Chains
Seunghyun Park | Yuanyuan Lei
Findings of the Association for Computational Linguistics: ACL 2026
Seunghyun Park | Yuanyuan Lei
Findings of the Association for Computational Linguistics: ACL 2026
While LLMs demonstrate impressive reasoning capabilities, they remain fragile in multi-step logic deduction, where a single transition error can propagate through the entire reasoning chain, leading to unstable performance. In this work, we identify logical connectives as primary points of this structural fragility. Through empirical analysis, we show that logical connective tokens function as high entropy forking points, at which models frequently struggle to determine the correct logical direction. Motivated by this observation, we hypothesize that intervening in logical connective selection can guide LLMs towards the correct logical direction, thereby improving the overall reasoning chain. To validate this hypothesis, we propose a multi-layered framework that intervenes specifically at these logic-critical junctions in the reasoning process. Specifically, we introduce (1) Gradient-based Logical Steering to guide LLMs internal representations towards valid reasoning subspaces, (2) Localized Branching to resolve ambiguity via targeted look-ahead search, and (3) Targeted Transition Preference Optimization, a surgical reinforcement learning objective that selectively optimizes single-token preferences at logical pivots. Crucially, by concentrating intervention solely on logic-critical transitions, our framework achieves a favorable accuracy–efficiency trade-off compared to global inference time scaling methods like beam search and self-consistency.
2025
Watermarking for Factuality: Guiding Vision-Language Models Toward Truth via Tri-layer Contrastive Decoding
Kyungryul Back | Seongbeom Park | Milim Kim | Mincheol Kwon | SangHyeok Lee | Hyunyoung Lee | Junhee Cho | Seunghyun Park | Jinkyu Kim
Findings of the Association for Computational Linguistics: EMNLP 2025
Kyungryul Back | Seongbeom Park | Milim Kim | Mincheol Kwon | SangHyeok Lee | Hyunyoung Lee | Junhee Cho | Seunghyun Park | Jinkyu Kim
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Vision-Language Models (LVLMs) have recently shown promising results on various multimodal tasks, even achieving human-comparable performance in certain cases. Nevertheless, LVLMs remain prone to hallucinations–they often rely heavily on a single modality or memorize training data without properly grounding their outputs. To address this, we propose a training-free, tri-layer contrastive decoding with watermarking, which proceeds in three steps: (1) select a mature layer and an amateur layer among the decoding layers, (2) identify a pivot layer using a watermark-related question to assess whether the layer is visually well-grounded, and (3) apply tri-layer contrastive decoding to generate the final output. Experiments on public benchmarks such as POPE, MME and AMBER demonstrate that our method achieves state-of-the-art performance in reducing hallucinations in LVLMs and generates more visually grounded responses.
2023
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models
Geewook Kim | Hodong Lee | Daehee Kim | Haeji Jung | Sanghee Park | Yoonsik Kim | Sangdoo Yun | Taeho Kil | Bado Lee | Seunghyun Park
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Geewook Kim | Hodong Lee | Daehee Kim | Haeji Jung | Sanghee Park | Yoonsik Kim | Sangdoo Yun | Taeho Kil | Bado Lee | Seunghyun Park
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Recent advances in Large Language Models (LLMs) have stimulated a surge of research aimed at extending their applications to the visual domain. While these models exhibit promise in generating abstract image captions and facilitating natural conversations, their performance on text-rich images still requires improvement. In this paper, we introduce Contrastive Reading Model (Cream), a novel neural architecture designed to enhance the language-image understanding capability of LLMs by capturing intricate details that are often overlooked in existing methods. Cream combines vision and auxiliary encoders, fortified by a contrastive feature alignment technique, to achieve a more effective comprehension of language information in visually situated contexts within the images. Our approach bridges the gap between vision and language understanding, paving the way for the development of more sophisticated Document Intelligence Assistants. Through rigorous evaluations across diverse visually-situated language understanding tasks that demand reasoning capabilities, we demonstrate the compelling performance of Cream, positioning it as a prominent model in the field of visual document understanding. We provide our codebase and newly-generated datasets at https://github.com/naver-ai/cream.