Ryuichi Sumida
2026
MMAC: A Multilingual, Multimodal Alignment Framework for Cultural Grounding Evaluation
Weihua Zheng | Zhengyuan Liu | Tanmoy Chakraborty | Weiwen Xu | Xiaoxue Gao | Bryan Chen Zhengyu Tan | Bowei Zou | Chang Liu | Yujia Hu | Xing Xie | Xiaoyuan Yi | Jing Yao | Chaojun Wang | Long Li | Rui Liu | Huiyao Liu | Koji Inoue | Ryuichi Sumida | Tatsuya Kawahara | Fan Xu | Lingyu Ye | Wei Tian | Dongjun Kim | Jimin Jung | Jaehyung Seo | Nadya Yuki Wangsajaya | Pham Minh Duc | Ojasva Saxena | Palash Nandi | Xiyan Tao | Wiwik Karlina | Tuan Luong | Keertana Arun Vasan | Roy Ka-Wei Lee | Nancy F. Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Weihua Zheng | Zhengyuan Liu | Tanmoy Chakraborty | Weiwen Xu | Xiaoxue Gao | Bryan Chen Zhengyu Tan | Bowei Zou | Chang Liu | Yujia Hu | Xing Xie | Xiaoyuan Yi | Jing Yao | Chaojun Wang | Long Li | Rui Liu | Huiyao Liu | Koji Inoue | Ryuichi Sumida | Tatsuya Kawahara | Fan Xu | Lingyu Ye | Wei Tian | Dongjun Kim | Jimin Jung | Jaehyung Seo | Nadya Yuki Wangsajaya | Pham Minh Duc | Ojasva Saxena | Palash Nandi | Xiyan Tao | Wiwik Karlina | Tuan Luong | Keertana Arun Vasan | Roy Ka-Wei Lee | Nancy F. Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The global deployment of Large Language Models (LLMs) underscores the urgent need to evaluate their cultural alignment. However, assessing genuine "cultural awareness" across modalities (text, vision, speech) and languages remains a significant challenge. To comprehensively investigate this domain, we propose MMAC, a systematic framework that encompasses a tri-modally aligned cultural benchmark creation pipeline and a five-dimensional evaluation protocol to assess cross-country awareness disparities, evaluate cross-lingual and cross-modal consistency, and verify cultural knowledge generalization and grounding validity. Given the prevailing Western cultural bias in current models, we focus on 8 Asian countries as our dataset foundation to more acutely reveal potential cultural deficiencies in LLMs. Our dataset, MMAC-bench, features 27,000 human-curated questions across 10 languages. Crucially, it is the first dataset aligned at the input level across text, image, and speech, enabling direct cross-modal transfer tests. Each question consists of multiple-choice options accompanied by open-ended generated explanations, where 79% require multi-step reasoning grounded in cultural context, moving beyond simple memorization. We probe the causes of modal divergence, offering insights into fostering culturally robust MLLMs.
2025
Enhancing Long-term RAG Chatbots with Psychological Models of Memory Importance and Forgetting
Ryuichi Sumida | Koji Inoue | Tatsuya Kawahara
Dialogue & Discourse Volume 16
Ryuichi Sumida | Koji Inoue | Tatsuya Kawahara
Dialogue & Discourse Volume 16
This study addresses the issue of what a Retrieval-Augmented Generation (RAG) chatbot should remember and what it should forget, based on findings from psychology. RAG retrieves relevant memories from past interactions to generate responses, and its effectiveness has been demonstrated. As conversations continue, however, the amount of stored memory keeps growing, which not only requires large storage capacity but also risks retaining unnecessary information, potentially reducing retrieval efficiency.To tackle this problem, we propose LUFY (Long-term Understanding and identiFYing key exchanges), a RAG chatbot that evaluates six distinct memory-related metrics derived from psychological models and real-world data. Instead of simply summing these metrics, it uses learned weights to account for the importance of each one. By using these weighted scores, the system can prioritize and retain relevant memories while gradually forgetting less important ones during both retrieval and memory management.To evaluate the effectiveness of LUFY in long-term conversations, we conducted experiments with human participants, who engaged in text-based conversations with three types of chatbots, each using different forgetting mechanisms, for at least two hours. The length of these conversations was more than 4.5 times longer than the longest conversations reported in previous studies. The results showed that prioritizing emotionally engaging memories while forgetting most of the conversation significantly enhanced user satisfaction.
Search
Fix author
Co-authors
- Koji Inoue 2
- Tatsuya Kawahara 2
- Tanmoy Chakraborty 1
- Nancy Chen 1
- Pham Minh Duc 1
- Xiaoxue Gao 1
- Yujia Hu 1
- Jimin Jung 1
- Wiwik Karlina 1
- Dongjun Kim 1
- Roy Ka-Wei Lee 1
- Long Li 1
- Zhengyuan Liu 1
- Chang Liu 1
- Rui Liu 1
- Huiyao Liu 1
- Tuan Luong 1
- Palash Nandi 1
- Ojasva Saxena 1
- Jaehyung Seo 1
- Bryan Chen Zhengyu Tan 1
- Xiyan Tao 1
- Wei Tian 1
- Keertana Arun Vasan 1
- Chaojun Wang 1
- Nadya Yuki Wangsajaya 1
- Xing Xie 1
- Weiwen Xu 1
- Fan Xu (徐凡) 1
- Jing Yao 1
- Lingyu Ye 1
- Xiaoyuan Yi 1
- Weihua Zheng 1
- Bowei Zou (邹博伟) 1