2025
pdf
bib
abs
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
Jihyoung Jang
|
Minwook Bae
|
Minji Kim
|
Dilek Hakkani-Tür
|
Hyounghun Kim
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As chatbots continue to evolve toward human-like, real-world, interactions, multimodality remains an active area of research and exploration. So far, efforts to integrate multimodality into chatbots have primarily focused on image-centric tasks, such as visual dialogue and image-based instructions, placing emphasis on the “eyes” of human perception while neglecting the “ears”, namely auditory aspects. Moreover, these studies often center around static interactions that focus on discussing the modality rather than naturally incorporating it into the conversation, which limits the richness of simultaneous, dynamic engagement. Furthermore, while multimodality has been explored in multi-party and multi-session conversations, task-specific constraints have hindered its seamless integration into dynamic, natural conversations. To address these challenges, this study aims to equip chatbots with “eyes and ears” capable of more immersive interactions with humans. As part of this effort, we introduce a new multimodal conversation dataset, Multimodal Multi-Session Multi-Party Conversation (M3C), and propose a novel multimodal conversation model featuring multimodal memory retrieval. Our model, trained on the M3C, demonstrates the ability to seamlessly engage in long-term conversations with multiple speakers in complex, real-world-like settings, effectively processing visual and auditory inputs to understand and respond appropriately. Human evaluations highlight the model’s strong performance in maintaining coherent and dynamic interactions, demonstrating its potential for advanced multimodal conversational agents.
pdf
bib
abs
Enhancing Complex Reasoning in Knowledge Graph Question Answering through Query Graph Approximation
Hongjun Jeong
|
Minji Kim
|
Heesoo Jung
|
Ko Keun Kim
|
Hogun Park
Findings of the Association for Computational Linguistics: ACL 2025
Knowledge-grounded Question Answering (QA) aims to provide answers to structured queries or natural language questions by leveraging Knowledge Graphs (KGs). Existing approaches are mainly divided into Knowledge Graph Question Answering (KGQA) and Complex Query Answering (CQA). Both approaches have limitations: the first struggles to utilize KG context effectively when essential triplets related to the questions are missing in the given KGs, while the second depends on structured first-order logic queries. To overcome these limitations, we propose a novel framework termed Aqua-QA. Aqua-QAapproximates query graphs from natural language questions, enabling reasoning over KGs. We evaluate Aqua-QA on challenging QA tasks where KGs are incomplete in the context of QA, and complex logical reasoning is required to answer natural language questions. Experimental results on these datasets demonstrate that Aqua-QA outperforms existing methods, showcasing its effectiveness in handling complex reasoning tasks in knowledge-grounded QA settings.
2023
pdf
bib
abs
Bidirectional Masked Self-attention and N-gram Span Attention for Constituency Parsing
Soohyeong Kim
|
Whanhee Cho
|
Minji Kim
|
Yong Choi
Findings of the Association for Computational Linguistics: EMNLP 2023
Attention mechanisms have become a crucial aspect of deep learning, particularly in natural language processing (NLP) tasks. However, in tasks such as constituency parsing, attention mechanisms can lack the directional information needed to form sentence spans. To address this issue, we propose a Bidirectional masked and N-gram span Attention (BNA) model, which is designed by modifying the attention mechanisms to capture the explicit dependencies between each word and enhance the representation of the output span vectors. The proposed model achieves state-of-the-art performance on the Penn Treebank and Chinese Penn Treebank datasets, with F1 scores of 96.47 and 94.15, respectively. Ablation studies and analysis show that our proposed BNA model effectively captures sentence structure by contextualizing each word in a sentence through bidirectional dependencies and enhancing span representation.
2022
pdf
bib
abs
Detecting Suicidality with a Contextual Graph Neural Network
Daeun Lee
|
Migyeong Kang
|
Minji Kim
|
Jinyoung Han
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology
Discovering individuals’ suicidality on social media has become increasingly important. Many researchers have studied to detect suicidality by using a suicide dictionary. However, while prior work focused on matching a word in a post with a suicide dictionary without considering contexts, little attention has been paid to how the word can be associated with the suicide-related context. To address this problem, we propose a suicidality detection model based on a graph neural network to grasp the dynamic semantic information of the suicide vocabulary by learning the relations between a given post and words. The extensive evaluation demonstrates that the proposed model achieves higher performance than the state-of-the-art methods. We believe the proposed model has great utility in identifying the suicidality of individuals and hence preventing individuals from potential suicide risks at an early stage.