VideoEvent: Leveraging Relevance and LLMs for Video Question Answering

Chen-Chen Lin; Ming-Han Lee; KunRu Wu; Yu-Chee Tseng

VideoEvent: Leveraging Relevance and LLMs for Video Question Answering

Chen-Chen Lin, Ming-Han Lee, KunRu Wu, Yu-Chee Tseng

Abstract

We propose VideoEvent, a lightweight and efficient training-free framework for Video Question Answering (VQA) with large language models (LLMs). Although several training-free VQA methods have been proposed, they often neglect the temporal dependencies between frames or clips, treating them as isolated units and relying on complex or resource-intensive components. To address this limitation while maintaining performance and simplicity, we propose VideoEvent, a framework that segments an input video into question-relevant temporal events and selectively supplements them with low-level visual cues such as background and object layout. Our method selects semantically relevant time spans and retrieves one representative background frame to enrich the prompt to LLM. This design minimizes reliance on additional tools and reduces inference cost, making it highly suitable for practical deployment. Experimental results on EgoSchema and NExT-QA show that VideoEvent reduces inference cost by up to 30% while maintaining state-of-the-art accuracy, and its background module improves accuracy by 1–3% across multiple frameworks.

Anthology ID:: 2026.lrec-main.395
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 5024–5034
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.395/
DOI:
Bibkey:
Cite (ACL):: Chen-Chen Lin, Ming-Han Lee, KunRu Wu, and Yu-Chee Tseng. 2026. VideoEvent: Leveraging Relevance and LLMs for Video Question Answering. International Conference on Language Resources and Evaluation, main:5024–5034.
Cite (Informal):: VideoEvent: Leveraging Relevance and LLMs for Video Question Answering (Lin et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.395.pdf

PDF Cite Search Fix data