Generating Video Description using Sequence-to-sequence Model with Temporal Attention
Natsuda Laokulrat, Sang Phan, Noriki Nishida, Raphael Shu, Yo Ehara, Naoaki Okazaki, Yusuke Miyao, Hideki Nakayama
Abstract
Automatic video description generation has recently been getting attention after rapid advancement in image caption generation. Automatically generating description for a video is more challenging than for an image due to its temporal dynamics of frames. Most of the work relied on Recurrent Neural Network (RNN) and recently attentional mechanisms have also been applied to make the model learn to focus on some frames of the video while generating each word in a describing sentence. In this paper, we focus on a sequence-to-sequence approach with temporal attention mechanism. We analyze and compare the results from different attention model configuration. By applying the temporal attention mechanism to the system, we can achieve a METEOR score of 0.310 on Microsoft Video Description dataset, which outperformed the state-of-the-art system so far.- Anthology ID:
- C16-1005
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 44–52
- Language:
- URL:
- https://aclanthology.org/C16-1005
- DOI:
- Cite (ACL):
- Natsuda Laokulrat, Sang Phan, Noriki Nishida, Raphael Shu, Yo Ehara, Naoaki Okazaki, Yusuke Miyao, and Hideki Nakayama. 2016. Generating Video Description using Sequence-to-sequence Model with Temporal Attention. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 44–52, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Generating Video Description using Sequence-to-sequence Model with Temporal Attention (Laokulrat et al., COLING 2016)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/C16-1005.pdf
- Data
- MSVD