Video Highlight Prediction Using Audience Chat Reactions

Cheng-Yang Fu, Joon Lee, Mohit Bansal, Alexander Berg


Abstract
Sports channel video portals offer an exciting domain for research on multimodal, multilingual analysis. We present methods addressing the problem of automatic video highlight prediction based on joint visual features and textual analysis of the real-world audience discourse with complex slang, in both English and traditional Chinese. We present a novel dataset based on League of Legends championships recorded from North American and Taiwanese Twitch.tv channels (will be released for further research), and demonstrate strong results on these using multimodal, character-level CNN-RNN model architectures.
Anthology ID:
D17-1102
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
972–978
Language:
URL:
https://aclanthology.org/D17-1102
DOI:
10.18653/v1/D17-1102
Bibkey:
Cite (ACL):
Cheng-Yang Fu, Joon Lee, Mohit Bansal, and Alexander Berg. 2017. Video Highlight Prediction Using Audience Chat Reactions. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 972–978, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Video Highlight Prediction Using Audience Chat Reactions (Fu et al., EMNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/D17-1102.pdf
Attachment:
 D17-1102.Attachment.zip