Do We Need Large VLMs for Spotting Soccer Actions?

Ritabrata Chakraborty; Rajatsubhra Chakraborty; Avijit Dasgupta; Sandeep Chaurasia

Do We Need Large VLMs for Spotting Soccer Actions?

Ritabrata Chakraborty, Rajatsubhra Chakraborty, Avijit Dasgupta, Sandeep Chaurasia

Abstract

Traditional video-based tasks like soccer action spotting rely heavily on visual inputs, often requiring complex and computationally expensive models to process dense video data. We propose a shift from this video-centric approach to a text-based task, making it lightweight and scalable by utilizing Large Language Models (LLMs) instead of Vision-Language Models (VLMs). We posit that expert commentary, which provides rich descriptions and contextual cues contains sufficient information to reliably spot key actions in a match. To demonstrate this, we employ a system of three LLMs acting as judges specializing in outcome, excitement, and tactics for spotting actions in soccer matches. Our experiments show that this language-centric approach performs effectively in detecting critical match events coming close to state-of-the-art video-based spotters while using zero video processing compute and similar amount of time to process the entire match.

Anthology ID:: 2025.ijcnlp-srw.6
Volume:: The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Santosh T.y.s.s, Shuichiro Shimizu, Yifan Gong
Venue:: IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 59–65
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.6/
DOI:
Bibkey:
Cite (ACL):: Ritabrata Chakraborty, Rajatsubhra Chakraborty, Avijit Dasgupta, and Sandeep Chaurasia. 2025. Do We Need Large VLMs for Spotting Soccer Actions?. In The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 59–65, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):: Do We Need Large VLMs for Spotting Soccer Actions? (Chakraborty et al., IJCNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.6.pdf

PDF Cite Search Fix data