Referring to Objects in Videos Using Spatio-Temporal Identifying Descriptions
Peratham Wiriyathammabhum, Abhinav Shrivastava, Vlad Morariu, Larry Davis
Abstract
This paper presents a new task, the grounding of spatio-temporal identifying descriptions in videos. Previous work suggests potential bias in existing datasets and emphasizes the need for a new data creation schema to better model linguistic structure. We introduce a new data collection scheme based on grammatical constraints for surface realization to enable us to investigate the problem of grounding spatio-temporal identifying descriptions in videos. We then propose a two-stream modular attention network that learns and grounds spatio-temporal identifying descriptions based on appearance and motion. We show that motion modules help to ground motion-related words and also help to learn in appearance modules because modular neural networks resolve task interference between modules. Finally, we propose a future challenge and a need for a robust system arising from replacing ground truth visual annotations with automatic video object detector and temporal event localization.- Anthology ID:
- W19-1802
- Volume:
- Proceedings of the Second Workshop on Shortcomings in Vision and Language
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 14–25
- Language:
- URL:
- https://aclanthology.org/W19-1802
- DOI:
- 10.18653/v1/W19-1802
- Cite (ACL):
- Peratham Wiriyathammabhum, Abhinav Shrivastava, Vlad Morariu, and Larry Davis. 2019. Referring to Objects in Videos Using Spatio-Temporal Identifying Descriptions. In Proceedings of the Second Workshop on Shortcomings in Vision and Language, pages 14–25, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Referring to Objects in Videos Using Spatio-Temporal Identifying Descriptions (Wiriyathammabhum et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/W19-1802.pdf
- Data
- TEMPO