Sentiment Analysis on Video Transcripts: Comparing the Value of Textual and Multimodal Annotations

Quanqi Du, Loic De Langhe, Els Lefever, Veronique Hoste


Abstract
This study explores the differences between textual and multimodal sentiment annotations on videos and their impact on transcript-based sentiment modelling. Using the UniC and CH-SIMS datasets which are annotated at both the unimodal and multimodal level, we conducted a statistical analysis and sentiment modelling experiments. Results reveal significant differences between the two annotation types, with textual annotations yielding better performance in sentiment modelling and demonstrating superior generalization ability. These findings highlight the challenges of cross-modality generalization and provide insights for advancing sentiment analysis.
Anthology ID:
2025.wnut-1.2
Volume:
Proceedings of the Tenth Workshop on Noisy and User-generated Text
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
JinYeong Bak, Rob van der Goot, Hyeju Jang, Weerayut Buaphet, Alan Ramponi, Wei Xu, Alan Ritter
Venues:
WNUT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–15
Language:
URL:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.wnut-1.2/
DOI:
Bibkey:
Cite (ACL):
Quanqi Du, Loic De Langhe, Els Lefever, and Veronique Hoste. 2025. Sentiment Analysis on Video Transcripts: Comparing the Value of Textual and Multimodal Annotations. In Proceedings of the Tenth Workshop on Noisy and User-generated Text, pages 10–15, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Sentiment Analysis on Video Transcripts: Comparing the Value of Textual and Multimodal Annotations (Du et al., WNUT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.wnut-1.2.pdf