Gyeong-Moon Park

2026

Open Your Model’s Eyes: Video and Context-Aware Multimodal Backchannel Prediction
Min-Jae Kim | Jun-Yeong Moon | Mujeen Sung | Gyeong-Moon Park
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Backchannels, which signal listener states like empathy and understanding, are fundamental to natural human interaction. However, current approaches rely solely on audio and text. This omits crucial visual cues, such as facial expressions and gestures, as well as broader conversational contexts, which are necessary for accurate prediction. In this paper, we introduce Context-Aware Multimodal Alignment for Backchannel Prediction (CAMA-BC), a novel framework that leverages visual information through Multi-layer Multimodal Alignment (MMA). Our alignment process comprises two stages. First, Context Alignment (MMA-CA) utilizes unlabeled dialogues with videos to capture conversational contexts. Next, Backchannel Alignment (MMA-BA) fine-tunes the representations specifically for backchannel prediction. Experimental results show that CAMA-BC significantly outperforms both existing methods and simple multimodal baselines, with particular effectiveness in recognizing complex backchannels such as empathy.

2024

pdf bib abs

Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety and robustness of models deployed in real-world scenarios. While most studies on OOD detection focus on fine-tuned models trained on in-distribution (ID) data, detecting OOD in pre-trained models is also important due to computational limitations and the widespread use of open-source pre-trained models. However, in the same domain shift setting, the OOD detection performance of pre-trained models is insufficient because both ID and OOD samples originate from the same domain, leading to a high overlap in their embeddings. To address this issue, we introduce a new method called CED, a training-free OOD detection technique designed to enhance the distinction between ID and OOD datasets. We theoretically validate that specific auxiliary and oracle samples that satisfy certain conditions improve this distinction. Motivated by our theoretical analysis, CED enhances the differentiation by utilizing these specially designed auxiliary and oracle samples. As a result, CED significantly improves the ability of pre-trained models to distinguish between ID and OOD samples in text classification and hallucination detection tasks. Furthermore, we verify that CED is a plug-and-play method compatible with various backbone networks, such as RoBERTa, Llama, and OpenAI Embedding.

Co-authors

Venues

ACL1
Findings1

Fix author