Yanghai Wang
2026
M3TQA: Massively Multilingual Multitask Table Question Answering
Daixin Shu | Jian Yang | Zhenhe Wu | Xianjie Wu | Xianfu Cheng | Guan Xiangyuan | Yanghai Wang | Pengfei Wu | Tingyang Yang | Hualei Zhu | Wei Zhang | Ge Zhang | Jiaheng Liu | Zhoujun Li
Findings of the Association for Computational Linguistics: ACL 2026
Daixin Shu | Jian Yang | Zhenhe Wu | Xianjie Wu | Xianfu Cheng | Guan Xiangyuan | Yanghai Wang | Pengfei Wu | Tingyang Yang | Hualei Zhu | Wei Zhang | Ge Zhang | Jiaheng Liu | Zhoujun Li
Findings of the Association for Computational Linguistics: ACL 2026
Tabular data is a fundamental component of real-world information systems. However, existing multilingual table benchmarks suffer from geolinguistic imbalance - overrepresenting certain languages and lacking sufficient scale for rigorous cross-lingual analysis. To address these limitations, we introduce M3TQA, which is a comprehensive framework for massively multilingual multitask table question answering, including subsequent datasets M3TQA-BENCH and M3TQA-INSTRUCT, featuring tables expanded to 97 languages from Chinese and English sources. M3TQA-BENCH includes 6,606 professionally annotated question-answering pairs across four tasks designed to evaluate nuanced table reasoning capabilities. Additionally, we synthesized the training set M3TQA-INSTRUCT in 97 languages using Large Language Model (LLM). Experiments on state-of-the-art LLMs reveal critical insights into cross-lingual generalization, demonstrating that synthetically generated, unannotated training data can significantly boost performance, particularly for low-resource languages. M3TQA establishes a new standard for multilingual table understanding, providing both a challenging evaluation platform and a scalable methodology for future research.
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
Yaning Pan | Qianqian Xie | Guohui Zhang | Zekun Moore Wang | Yongqian Wen | Yuanxing Zhang | Haoxuan Hu | Zhiyu Pan | Yibing Huang | Zhidong Gan | Yonghong Lin | An Ping | Shihao Li | Yanghai Wang | Tianhao Peng | Jiaheng Liu
Findings of the Association for Computational Linguistics: ACL 2026
Yaning Pan | Qianqian Xie | Guohui Zhang | Zekun Moore Wang | Yongqian Wen | Yuanxing Zhang | Haoxuan Hu | Zhiyu Pan | Yibing Huang | Zhidong Gan | Yonghong Lin | An Ping | Shihao Li | Yanghai Wang | Tianhao Peng | Jiaheng Liu
Findings of the Association for Computational Linguistics: ACL 2026
The recent development of Multimodal Large Language Models (MLLMs) has significantly advanced AI’s ability to understand visual modalities. However, existing evaluation benchmarks remain limited to single-turn question answering, overlooking the complexity of multi-turn dialogues in real-world scenarios. To bridge this gap, we introduce MT-Video-Bench, a holistic video understanding benchmark for evaluating MLLMs in multi-turn dialogues. Specifically, our MT-Video-Bench mainly assesses six core competencies that focus on perceptivity and interactivity, encompassing 1,000 meticulously curated multi-turn dialogues from diverse domains. These capabilities are rigorously aligned with real-world applications, such as interactive sports analysis and multi-turn video-based intelligent tutoring. With MT-Video-Bench, we extensively evaluate various state-of-the-art open-source and closed-source MLLMs, revealing their significant performance discrepancies and limitations in handling multi-turn video dialogues. The benchmark will be publicly available to foster future research.