Do We Need Large VLMs for Spotting Soccer Actions?
Ritabrata Chakraborty, Rajatsubhra Chakraborty, Avijit Dasgupta, Sandeep Chaurasia
Abstract
Traditional video-based tasks like soccer action spotting rely heavily on visual inputs, often requiring complex and computationally expensive models to process dense video data. We propose a shift from this video-centric approach to a text-based task, making it lightweight and scalable by utilizing Large Language Models (LLMs) instead of Vision-Language Models (VLMs). We posit that expert commentary, which provides rich descriptions and contextual cues contains sufficient information to reliably spot key actions in a match. To demonstrate this, we employ a system of three LLMs acting as judges specializing in outcome, excitement, and tactics for spotting actions in soccer matches. Our experiments show that this language-centric approach performs effectively in detecting critical match events coming close to state-of-the-art video-based spotters while using zero video processing compute and similar amount of time to process the entire match.- Anthology ID:
- 2025.ijcnlp-srw.6
- Volume:
- The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
- Month:
- December
- Year:
- 2025
- Address:
- Mumbai, India
- Editors:
- Santosh T.y.s.s, Shuichiro Shimizu, Yifan Gong
- Venue:
- IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 59–65
- Language:
- URL:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.6/
- DOI:
- Cite (ACL):
- Ritabrata Chakraborty, Rajatsubhra Chakraborty, Avijit Dasgupta, and Sandeep Chaurasia. 2025. Do We Need Large VLMs for Spotting Soccer Actions?. In The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 59–65, Mumbai, India. Association for Computational Linguistics.
- Cite (Informal):
- Do We Need Large VLMs for Spotting Soccer Actions? (Chakraborty et al., IJCNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.6.pdf