Changyu Zeng
2026
MEUR: A Benchmark for Evaluating Vision-Language Models on Multimodal Event Understanding and Reasoning
Zimu Wang | Yuqi Wang | Tong Chen | Changyu Zeng | Hongbin Na | Nijia Han | Fuyu Xing | Qi Chen | Qiufeng Wang | Anh Nguyen | Shuihua Wang | Ling Chen | Jionglong Su | Haiyang Zhang | Wei Wang
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Zimu Wang | Yuqi Wang | Tong Chen | Changyu Zeng | Hongbin Na | Nijia Han | Fuyu Xing | Qi Chen | Qiufeng Wang | Anh Nguyen | Shuihua Wang | Ling Chen | Jionglong Su | Haiyang Zhang | Wei Wang
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Event understanding and reasoning play critical roles in thoroughly evaluating the capabilities of Vision-Language Models (VLMs); however, existing Visual Question Answering (VQA) datasets predominantly focus on entity-centric questions, while event- or action-related questions are limited in scale and suffer from significant shortcut issues. We introduce MEUR, the first Multimodal Event Understanding and Reasoning dataset consisting of 1,200 images and 4,217 questions, necessitating VLMs with a diverse range of multimodal understanding and reasoning capabilities to answer, ranging from basic event recognition to more complex tasks such as counting and comparison. To streamline the annotation process, we propose a novel semi-automated pipeline that combines advanced VLMs with human annotators, achieving high quality and efficiency. We conduct extensive experiments on state-of-the-art non-thinking and thinking VLMs to demonstrate their capabilities and limitations in multimodal event understanding and reasoning. Furthermore, we provide a detailed error analysis that points out promising directions for future research.
2025
NUMINA: A Natural Understanding Benchmark for Multi-dimensional Intelligence and Numerical Reasoning Abilities
Changyu Zeng | Yifan Wang | Zimu Wang | Wei Wang | Zhengni Yang | Muyi Bao | Jimin Xiao | Anh Nguyen | Yutao Yue
Findings of the Association for Computational Linguistics: EMNLP 2025
Changyu Zeng | Yifan Wang | Zimu Wang | Wei Wang | Zhengni Yang | Muyi Bao | Jimin Xiao | Anh Nguyen | Yutao Yue
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent advancements in 2D multimodal large language models (MLLMs) have significantly improved performance in vision-language tasks. However, extending these capabilities to 3D environments remains a distinct challenge due to the complexity of spatial reasoning. Nevertheless, existing 3D benchmarks often lack fine-grained numerical reasoning task annotations, limiting MLLMs’ ability to perform precise spatial measurements and complex numerical reasoning. To address this gap, we introduce NUMINA, the first Natural Understanding benchmark for Multi-dimensional Intelligence and Numerical reasoning Abilities to enhance multimodal indoor perceptual understanding. NUMINA features multi-scale annotations and various question-answer pairs, generated using NUMINA-Flow, an automated annotation pipeline that integrates LLM rewriting and rule-based self-verification. We evaluate the performance of various state-of-the-art LLMs on NUMINA following the Chat-Scene framework, demonstrating that current LLMs struggle with multimodal numerical reasoning, particularly in performing precise computations such as distance and volume estimation, highlighting the need for further advancements in 3D models. The dataset and source codes can be obtained from https://github.com/fengshun124/NUMINA.
FinDebate: Multi-Agent Collaborative Intelligence for Financial Analysis
Tianshi Cai | Guanxu Li | Nijia Han | Ce Huang | Zimu Wang | Changyu Zeng | Yuqi Wang | Jingshi Zhou | Haiyang Zhang | Qi Chen | Yushan Pan | Shuihua Wang | Wei Wang
Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing
Tianshi Cai | Guanxu Li | Nijia Han | Ce Huang | Zimu Wang | Changyu Zeng | Yuqi Wang | Jingshi Zhou | Haiyang Zhang | Qi Chen | Yushan Pan | Shuihua Wang | Wei Wang
Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing