From What Is Said to Why It Is Framed: Intent-Aware News Video Understanding
Xiangzheng Kong, Minnan Luo, Wenya Wang, Jiaying Wu, Zhi Zeng, Guang Dai
Abstract
Short-form news videos increasingly shape public perception through strategic framing, yet existing verification methods largely overlook the communicative intent underlying such content. By emphasizing surface semantics, current models struggle to separate stylistic presentation from factual evidence, which leads to shortcut learning and brittle generalization. To address this limitation, we propose the Origin–Objective–Means (OOM) framework, a theory-grounded representation of communicative intent that captures creator stance, audience need activation, and communication strategy. We validate OOM through large-scale human annotation, revealing distinct and consistent lexical and structural patterns across intent dimensions. Building on this representation, we operationalize intent as an explicit semantic condition rather than a prediction target. Concretely, we introduce Intent-Guided Prompting (IGP) to condition LLM reasoning and intent-conditioned multimodal detection framework (ICMD), which injects intent into multimodal detectors via feature-wise modulation. Experiments on FakeSV and FakeTT show that modeling intent as an intermediate condition consistently improves accuracy and robustness across diverse vision–language backbones, while substantially reducing reliance on spurious stylistic correlations.- Anthology ID:
- 2026.findings-acl.1945
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 39039–39050
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1945/
- DOI:
- Cite (ACL):
- Xiangzheng Kong, Minnan Luo, Wenya Wang, Jiaying Wu, Zhi Zeng, and Guang Dai. 2026. From What Is Said to Why It Is Framed: Intent-Aware News Video Understanding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39039–39050, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- From What Is Said to Why It Is Framed: Intent-Aware News Video Understanding (Kong et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1945.pdf