Kim Sung-Bin

2026

SMILE-Next: Teaching Large Language Models to Detect, Classify, and Reason about Laughter
Lee Jung-Mok | Kim Sung-Bin | Joohyun Chang | Lee Hyun | Tae-Hyun Oh
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Laughter is a complex social signal that conveys communicative intent beyond amusement. While prior work has focused on isolated laughter analysis tasks, a comprehensive understanding of laughter in real-world scenarios remains underexplored. We introduce SMILE-Next, a dataset for real-world laughter understanding with multimodal textual representations and question–answer annotations across three tasks: laughter detection, laughter type classification, and laughter reasoning. Building on this dataset, we propose a laughter expert LLM that leverages disentangled multimodal textual cues, together with a Mixture-of-Laugh-Experts framework and laughter-specific self-instruction for task-adaptive specialization. Experimental results show that the combination of our proposed components substantially outperforms multimodal LLM baselines, advancing robust real-world laughter understanding.

2024

pdf bib abs

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun | Kim Sung-Bin | Seungju Han | Youngjae Yu | Tae-Hyun Oh
Findings of the Association for Computational Linguistics: NAACL 2024

Despite the recent advances in artificial intelligence, building social intelligence remains a challenge.Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans.In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning.We introduce this new task to explain why people laugh in a particular video and a dataset for this task.Our proposed dataset, SMILE, comprises video clips and language descriptions of why people laugh. We propose a baseline by leveraging the reasoning capacity of large language models (LLMs) with textual video representation. Experiments show that our baseline can generate plausible explanations for laughter. We further investigate the scalability of our baseline by probing other video understanding tasks and in-the-wild videos. We release our dataset, code, and model checkpoints on https://github.com/postech-ami/SMILE-Dataset.

Co-authors

Youngjae Yu 1

Venues

ACL1
Findings1

Fix author