Real-time Commentator Assistant for Photo Editing Live Streaming

Matīss Rikters, Goran Topić


Abstract
Live commentary has the potential of making specific broadcasts such as sports or video games more engaging and interesting to watch for spectators. With the recent popularity rise of online live streaming many new categories have entered the space, like art in its many forms or even software development, however, not all live streamers have the capability to be naturally engaging with the audience. We introduce a live commentator assistant system that can discuss what is visible on screen in real time. Our experimental setting is focused on the use-case of a photo editing live stream. We compare several recent vision language models for commentary generation and text to speech models for spoken output, all on relatively modest consumer hardware configurations.
Anthology ID:
2025.ijcnlp-demo.3
Volume:
Proceedings of The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Xuebo Liu, Ayu Purwarianti
Venue:
IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17–24
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-demo.3/
DOI:
Bibkey:
Cite (ACL):
Matīss Rikters and Goran Topić. 2025. Real-time Commentator Assistant for Photo Editing Live Streaming. In Proceedings of The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations, pages 17–24, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Real-time Commentator Assistant for Photo Editing Live Streaming (Rikters & Topić, IJCNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-demo.3.pdf