Cui Yakun
2026
Perception, Understanding and Reasoning: A Multimodal Benchmark for Video Fake News Detection
Cui Yakun | Peng Qi | Fushuo Huo | Hang Du | Weijie Shi | Juntao Dai | Zhenghao Zhu | Sirui Han
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Cui Yakun | Peng Qi | Fushuo Huo | Hang Du | Weijie Shi | Juntao Dai | Zhenghao Zhu | Sirui Han
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The advent of multi-modal large language models (MLLMs) has greatly advanced research on video fake news detection (VFND) tasks. Existing benchmarks typically focus on the detection accuracy, while failing to provide fine-grained assessments for the entire detection process. To address these limitations, we introduce POVFNDB (Process-oriented Video Fake News Detection Benchmark), a process-oriented benchmark comprising 10 tasks designed to systematically evaluate MLLMs’ perception, understanding, and reasoning capabilities in VFND. This benchmark contains 36,240 human-annotated question-answer (QA) in structured or open-ended formats, spanning 15 distinct evaluation dimensions that characterize different aspects of the video fake news detection process.Using POVFNDB, we conduct comprehensive evaluations on both proprietary and open-source MLLMs. Moreover, We fine-tune Qwen2.5VL-7B-Instruct on a reasoning dataset generated by our proposed POVFND-CoT, a chain-of-thought method that utilizes rationales from evaluation results and rationale validation. The resulting model achieves sota performance on VFND.
InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents
Zhenghao Zhu | Yuanfeng Song | Xing Chen | Chengzhong Liu | Cui Yakun | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Zhenghao Zhu | Yuanfeng Song | Xing Chen | Chengzhong Liu | Cui Yakun | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive datasets, we need to perform deep exploratory analysis to realize their full value. With the advent of large language models (LLMs) and multi-agent systems, more and more researchers are making use of these technologies for insight discovery. However, there are few benchmarks for evaluating insight discovery capabilities. As one of the most comprehensive existing frameworks, InsightBench also suffers from many critical flaws: format inconsistencies, poorly conceived objectives, and redundant insights. These issues may significantly affect the quality of data and the evaluation of agents. To address these issues, we thoroughly investigate shortcomings in InsightBench and propose essential criteria for a high-quality insight benchmark. Regarding this, we develop a data-curation pipeline to construct a new dataset named InsightEval. We further introduce a novel metric to measure the exploratory performance of agents. Through extensive experiments on InsightEval, we highlight prevailing challenges in automated insight discovery and raise some key findings to guide future research in this promising direction.