Yuanchen Bei
2026
Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents
Yuanchen Bei | Tianxin Wei | Xuying Ning | Yanjun Zhao | Zhining Liu | Xiao Lin | Yada Zhu | Hendrik Hamann | Jingrui He | Hanghang Tong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuanchen Bei | Tianxin Wei | Xuying Ning | Yanjun Zhao | Zhining Liu | Xiao Lin | Yada Zhu | Hendrik Hamann | Jingrui He | Hanghang Tong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Long-term memory is a critical capability for multimodal large language model (MLLM) agents, particularly in conversational settings where information accumulates and evolves over time. However, existing benchmarks either evaluate multi-session memory in text-only conversations or assess multimodal understanding within localized contexts, failing to evaluate how multimodal memory is preserved, organized, and evolved across long-term conversational trajectories. Thus, we introduce Mem-Gallery, a new benchmark for evaluating multimodal long-term conversational memory in MLLM agents. Mem-Gallery features high-quality multi-session conversations grounded in both visual and textual information, with long interaction horizons and rich multimodal dependencies. Building on this dataset, we propose a systematic evaluation framework that assesses key memory capabilities along three functional dimensions: memory extraction and test-time adaptation, memory reasoning, and memory knowledge management. Extensive benchmarking across twelve memory systems reveals several key findings, highlighting the necessity of explicit multimodal information retention and memory organization, the persistent limitations in memory reasoning and knowledge management, as well as the efficiency bottleneck of current models. Our benchmark and dataset are available at https://github.com/YuanchenBei/Mem-Gallery.
PAPERMIND: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs
Yanjun Zhao | Tianxin Wei | Jiaru Zou | Xuying Ning | Yuanchen Bei | Lingjie Chen | Simmi Rana | Wendy H. Yang | Hanghang Tong | Jingrui He
Findings of the Association for Computational Linguistics: ACL 2026
Yanjun Zhao | Tianxin Wei | Jiaru Zou | Xuying Ning | Yuanchen Bei | Lingjie Chen | Simmi Rana | Wendy H. Yang | Hanghang Tong | Jingrui He
Findings of the Association for Computational Linguistics: ACL 2026
Understanding scientific papers requires more than answering isolated questions or summarizing content. It involves an integrated reasoning process that grounds textual and visual information, interprets experimental evidence, synthesizes information across sources, and critically evaluates scientific claims. However, existing benchmarks typically assess these abilities in isolation, making it difficult to evaluate scientific paper understanding as a unified set of interacting cognitive abilities. In this work, we introduce PaperMind , a benchmark designed to evaluate integrated and agent-oriented scientific reasoning over research papers. PaperMind is constructed from real scientific papers across seven domains, including agriculture, biology, chemistry, computer science, medicine, physics, and economics. It comprises four complementary task families that collectively operationalize distinct cognitive facets of scientific paper reasoning, including multimodal grounding, experimental interpretation, cross-source evidence reasoning, and critical assessment. By analyzing model behavior across multiple tasks, PaperMind enables a diagnostic evaluation of integrated scientific reasoning behaviors that are difficult to assess through isolated task evaluations. Extensive experiments on both open-source and closed-source multimodal LLMs reveal consistent performance gaps across tasks, highlighting persistent challenges in integrated scientific reasoning and critique. Our benchmark and dataset are available at https://github.com/Yanjun-Zhao/PaperMind.
2025
A Survey of RAG-Reasoning Systems in Large Language Models
Yangning Li | Weizhi Zhang | Yuyao Yang | Wei-Chieh Huang | Yaozu Wu | Junyu Luo | Yuanchen Bei | Henry Peng Zou | Xiao Luo | Yusheng Zhao | Chunkit Chan | Yankai Chen | Zhongfen Deng | Yinghui Li | Hai-Tao Zheng | Dongyuan Li | Renhe Jiang | Ming Zhang | Yangqiu Song | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Yangning Li | Weizhi Zhang | Yuyao Yang | Wei-Chieh Huang | Yaozu Wu | Junyu Luo | Yuanchen Bei | Henry Peng Zou | Xiao Luo | Yusheng Zhao | Chunkit Chan | Yankai Chen | Zhongfen Deng | Yinghui Li | Hai-Tao Zheng | Dongyuan Li | Renhe Jiang | Ming Zhang | Yangqiu Song | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge, yet it falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts. This survey synthesizes both strands under a unified reasoning-search perspective. We first map how advanced reasoning optimizes each stage of RAG (Reasoning-Enhanced RAG). Then, we show how retrieved knowledge of different type supply missing premises and expand context for complex inference (RAG-Enhanced Reasoning). Finally, we spotlight emerging Synergized RAG-Reasoning frameworks, where (agentic) LLMs iteratively interleave search and thought to achieve state-of-the-art performance across knowledge-intensive benchmarks. We categorize methods, datasets, and open challenges, and outline research avenues toward deeper RAG-Reasoning systems that are more effective, multimodally-adaptive, trustworthy, and human-centric.
Search
Fix author
Co-authors
- Jingrui He 2
- Xuying Ning 2
- Hanghang Tong 2
- Tianxin Wei 2
- Yanjun Zhao 2
- Chunkit Chan 1
- Lingjie Chen 1
- Yankai Chen 1
- Zhongfen Deng 1
- Hendrik Hamann 1
- Wei-Chieh Huang 1
- Renhe Jiang 1
- Yangning Li 1
- Yinghui Li 1
- Dongyuan Li 1
- Xiao Lin 1
- Zhining Liu 1
- Junyu Luo 1
- Xiao Luo 1
- Simmi Rana 1
- Yangqiu Song 1
- Yaozu Wu 1
- Wendy H. Yang 1
- Yuyao Yang 1
- Philip S. Yu 1
- Weizhi Zhang 1
- Ming Zhang 1
- Yusheng Zhao 1
- Hai-Tao Zheng 1
- Yada Zhu 1
- Jiaru Zou 1
- Henry Peng Zou 1