Abstract
Video content on social media platforms constitutes a major part of the communication between people, as it allows everyone to share their stories. However, if someone is unable to consume video, either due to a disability or network bandwidth, this severely limits their participation and communication. Automatically telling the stories using multi-sentence descriptions of videos would allow bridging this gap. To learn and evaluate such models, we introduce VideoStory a new large-scale dataset for video description as a new challenge for multi-sentence video description. Our VideoStory captions dataset is complementary to prior work and contains 20k videos posted publicly on a social media platform amounting to 396 hours of video with 123k sentences, temporally aligned to the video.- Anthology ID:
- D18-1117
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 968–974
- Language:
- URL:
- https://aclanthology.org/D18-1117
- DOI:
- 10.18653/v1/D18-1117
- Cite (ACL):
- Spandana Gella, Mike Lewis, and Marcus Rohrbach. 2018. A Dataset for Telling the Stories of Social Media Videos. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 968–974, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- A Dataset for Telling the Stories of Social Media Videos (Gella et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/D18-1117.pdf
- Data
- ActivityNet Captions, YouCook