Manh-Cuong Phan

2026

CMTD: Cognitive Modeling with Traits and Distortions for Multimodal Emotion Recognition in Conversations
Minh-Tien Nguyen | Huu-Loi Le | Manh-Cuong Phan | Hajime Hotta
Findings of the Association for Computational Linguistics: ACL 2026

This paper introduces a new multi-agent framework, CMTD (Cognitive Modeling with Traits and Distortions), for multimodal emotion recognition in conversations (MERC). Instead of relying on shallow analysis of emotions, CMTD reconstructs a cognitive model by taking advantage of stable personality traits, dynamic cognitive distortions, visual and acoustic features of interlocutors to enhance the emotional intelligence of LLMs. CMTD includes trait, distortion detection, vision, and speech agents that provide psychological and multimodal indicators for the fusion agent to make the final prediction. Experimental results on MELD and IEMOCAP show that traits temper negativity bias from distortions, and cognitive modeling with psychological, visual, and acoustic information can improve the performance of MERC.CMTD is flexible and easy to adapt to advanced emotional AI systems (Github link: https://github.com/Shaun-le/CMTD.git).

2025

pdf bib

From Span Extraction to Classification: A Multi-step Framework for Cognitive Distortion Analysis
Manh-Cuong Phan | Thi-Ngoc-Phuong Nguyen | Huu-Loi Le | Huy-The Vu | Hajime Hotta | Minh-Tien Nguyen
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation

pdf bib abs

Metamo: Empowering Large Language Models with Psychological Distortion Detection for Cognition-aware Coaching
Hajime Hotta | Huu-Loi Le | Manh-Cuong Phan | Minh-Tien Nguyen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We demonstrate Metamo, a browser-based dialogue system that transforms an off-the-shelf large language model into an empathetic coach for everyday workplace concerns. Metamo introduces a light, single-pass wrapper that first identifies the cognitive distortion behind an emotion, then recognizes the user’s emotion, and finally produces a question-centered reply that invites reflection, all within one model call. The wrapper keeps the response time below two seconds in the API, yet enriches the feedback with cognitively grounded insight. A front-end web interface renders the detected emotion as an animated avatar and shows distortion badges in real time, whereas a safety layer blocks medical advice and redirects crisis language to human hotlines. Empirical tests on public corpora confirmed that the proposed design improved emotion‐recognition quality and response diversity without sacrificing latency. A small user study with company staff reported higher perceived empathy and usability than a latency‐matched baseline. Metamo is model-agnostic, illustrating a practical path toward cognition-aware coaching tools.

Co-authors

Venues

Fix author