Jeongwoo Lee
2026
Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision–Language Models
Jeongwoo Lee | Baek Duhyeong | Eungyeol Han | Soyeon Shin | Gukin Han | Seungduk Kim | Jaehyun Jeon | Taewoo Jeong
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Jeongwoo Lee | Baek Duhyeong | Eungyeol Han | Soyeon Shin | Gukin Han | Seungduk Kim | Jaehyun Jeon | Taewoo Jeong
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Recent advances in Vision–Language Models (VLMs) have demonstrated impressive multimodal understanding in general domains. However, their applicability to decision-oriented domains such as hospitality remains largely unexplored. In this work, we investigate how well VLMs can perform visual question answering (VQA) about hotel and facility images that are central to consumer decision-making. While many existing VQA benchmarks focus on factual correctness, they rarely capture what information users actually find useful. To address this, we first introduce Informativeness as a formal framework to quantify how much hospitality-relevant information an image–question pair provides.Guided by this framework, we construct a new hospitality-specific VQA dataset that covers various facility types, where questions are specifically designed to reflect key user information needs. Using this benchmark, we conduct experiments with several state-of-the-art VLMs, revealing that VLMs are not intrinsically decision-aware—key visual signals remain underutilized, and reliable informativeness reasoning emerges only after modest domain-specific finetuning.
VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought
Eunsoo Lee | Jeongwoo Lee | Minki Hong | Jangho Choi | Jihie Kim
Findings of the Association for Computational Linguistics: EACL 2026
Eunsoo Lee | Jeongwoo Lee | Minki Hong | Jangho Choi | Jihie Kim
Findings of the Association for Computational Linguistics: EACL 2026
Large vision-language models (LVLMs) struggle to reliably detect visual primitives in charts and align them with semantic representations, which severely limits their performance on complex visual reasoning. This lack of perceptual grounding constitutes a major bottleneck for chart-based reasoning. We propose VisDoT, a framework that enhances visual reasoning through human-like interpretation grounding. We formalize four perceptual tasks based on the theory of graphical perception such as position and length. Building on this foundation, we introduce decomposition-of-thought (DoT) prompting, which sequentially separates questions into visual perception sub-questions and logic sub-questions. Fine-tuning InternVL with VisDoT achieves a +11.2% improvement on ChartQA and surpasses GPT-4o on the more challenging ChartQAPro benchmark. On the newly introduced VisDoTQA benchmark, the model improves by +33.2%. Furthermore, consistent zero-shot gains on diverse open-domain VQA benchmarks confirm the generalizability of the perception-logic separation strategy for visual question answering in general. VisDoT leverages human-like perception to enhance visual grounding, achieving state-of-the-art chart understanding and interpretable visual reasoning.
2023
Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations
Yoonna Jang | Suhyune Son | Jeongwoo Lee | Junyoung Son | Yuna Hur | Jungwoo Lim | Hyeonseok Moon | Kisu Yang | Heuiseok Lim
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Yoonna Jang | Suhyune Son | Jeongwoo Lee | Junyoung Son | Yuna Hur | Jungwoo Lim | Hyeonseok Moon | Kisu Yang | Heuiseok Lim
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, entity-level hallucination that causes critical misinformation and undesirable conversation is one of the major concerns. To address this issue, we propose a post-hoc refinement method called REM. It aims to enhance the quality and faithfulness of hallucinated utterances by refining them based on the source knowledge. If the generated utterance has a low source-faithfulness score with the given knowledge, REM mines the key entities in the knowledge and implicitly uses them for refining the utterances. We verify that our method reduces entity hallucination in the utterance. Also, we show the adaptability and efficacy of REM with extensive experiments and generative results. Our code is available at https://github.com/YOONNAJANG/REM.