Perception, Understanding and Reasoning: A Multimodal Benchmark for Video Fake News Detection

Cui Yakun; Peng Qi; Fushuo Huo; Hang Du; Weijie Shi; Juntao Dai; Zhenghao Zhu; Sirui Han

Perception, Understanding and Reasoning: A Multimodal Benchmark for Video Fake News Detection

Cui Yakun, Peng Qi, Fushuo Huo, Hang Du, Weijie Shi, Juntao Dai, Zhenghao Zhu, Sirui Han

Abstract

The advent of multi-modal large language models (MLLMs) has greatly advanced research on video fake news detection (VFND) tasks. Existing benchmarks typically focus on the detection accuracy, while failing to provide fine-grained assessments for the entire detection process. To address these limitations, we introduce POVFNDB (Process-oriented Video Fake News Detection Benchmark), a process-oriented benchmark comprising 10 tasks designed to systematically evaluate MLLMs’ perception, understanding, and reasoning capabilities in VFND. This benchmark contains 36,240 human-annotated question-answer (QA) in structured or open-ended formats, spanning 15 distinct evaluation dimensions that characterize different aspects of the video fake news detection process.Using POVFNDB, we conduct comprehensive evaluations on both proprietary and open-source MLLMs. Moreover, We fine-tune Qwen2.5VL-7B-Instruct on a reasoning dataset generated by our proposed POVFND-CoT, a chain-of-thought method that utilizes rationales from evaluation results and rationale validation. The resulting model achieves sota performance on VFND.

Anthology ID:: 2026.acl-long.2103
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 45332–45363
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2103/
DOI:
Bibkey:
Cite (ACL):: Cui Yakun, Peng Qi, Fushuo Huo, Hang Du, Weijie Shi, Juntao Dai, Zhenghao Zhu, and Sirui Han. 2026. Perception, Understanding and Reasoning: A Multimodal Benchmark for Video Fake News Detection. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45332–45363, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Perception, Understanding and Reasoning: A Multimodal Benchmark for Video Fake News Detection (Yakun et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2103.pdf
Checklist:: 2026.acl-long.2103.checklist.pdf

PDF Cite Search Checklist Fix data