Min Soo Kim
Yonsei
Other people with similar names: Min-Soo Kim
Unverified author pages with similar names: Min-Soo Kim
2026
Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding
Jaehyun Jeon | Min Soo Kim | Janghan Yoon | Sumin Shim | Yejin Choi | Hanbin Kim | Dae Hyun Kim | Youngjae Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jaehyun Jeon | Min Soo Kim | Janghan Yoon | Sumin Shim | Yejin Choi | Hanbin Kim | Dae Hyun Kim | Youngjae Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
User interface (UI) design goes beyond visuals to shape user experience (UX), underscoring the shift toward UI/UX as a unified concept. While recent studies have explored UI evaluation using Multimodal Large Language Models (MLLMs), they largely focus on surface-level features, overlooking how design choices influence user behavior at scale. To fill this gap, we introduce WiserUI-Bench, a novel benchmark for multimodal understanding of how UI/UX design affects user behavior, built on 300 real-world UI image pairs from industry A/B tests, with empirically validated winners that induced more user actions. For future design progress in practice, post-hoc understanding of why such winners succeed with mass users is also required; we support this via expert-curated key interpretations for each instance. Experiments across multiple MLLMs on WiserUI-Bench for two main tasks, (1) predicting the more effective UI image between an A/B-tested pair, and (2) explaining it post-hoc in alignment with expert interpretations, show that models exhibit limited understanding of the behavioral impact of UI/UX design. We believe our work will foster research on leveraging MLLMs for visual design in user behavior contexts.
2025
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
Junhyeok Kim | Min Soo Kim | Jiwan Chung | Jungbin Cho | Jisoo Kim | Sungwoong Kim | Gyeongbo Sim | Youngjae Yu
Findings of the Association for Computational Linguistics: NAACL 2025
Junhyeok Kim | Min Soo Kim | Jiwan Chung | Jungbin Cho | Jisoo Kim | Sungwoong Kim | Gyeongbo Sim | Youngjae Yu
Findings of the Association for Computational Linguistics: NAACL 2025
Predicting when to initiate speech in real-world environments remains a fundamental challenge for conversational agents. We introduce , a novel framework for real-time speech initiation prediction in egocentric streaming video. By modeling the conversation from the speaker’s first-person viewpoint, is tailored for human-like interactions in which a conversational agent must continuously observe its environment and dynamically decide when to talk.Our approach bridges the gap between simplified experimental setups and complex natural conversations by integrating four key capabilities: (1) first-person perspective, (2) RGB processing, (3) online processing, and (4) untrimmed video processing. We also present YT-Conversation, a diverse collection of in-the-wild conversational videos from YouTube, as a resource for large-scale pretraining. Experiments on EasyCom and Ego4D demonstrate that outperforms random and silence-based baselines in real time. Our results also highlight the importance of multimodal input and context length in effectively deciding when to speak. Code and data are available at website.