Yiannis Aloimonos
2024
Diving Deep into the Motion Representation of Video-Text Models
Chinmaya Devaraj
|
Cornelia Fermuller
|
Yiannis Aloimonos
Findings of the Association for Computational Linguistics ACL 2024
Videos are more informative than images becausethey capture the dynamics of the scene.By representing motion in videos, we can capturedynamic activities. In this work, we introduceGPT-4 generated motion descriptions thatcapture fine-grained motion descriptions of activitiesand apply them to three action datasets.We evaluated several video-text models on thetask of retrieval of motion descriptions. Wefound that they fall far behind human expertperformance on two action datasets, raisingthe question of whether video-text models understandmotion in videos. To address it, weintroduce a method of improving motion understandingin video-text models by utilizingmotion descriptions. This method proves tobe effective on two action datasets for the motiondescription retrieval task. The results drawattention to the need for quality captions involvingfine-grained motion information in existingdatasets and demonstrate the effectiveness ofthe proposed pipeline in understanding finegrainedmotion during video-text retrieval.
2015
Learning the Semantics of Manipulation Action
Yezhou Yang
|
Yiannis Aloimonos
|
Cornelia Fermüller
|
Eren Erdal Aksoy
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2011
Corpus-Guided Sentence Generation of Natural Images
Yezhou Yang
|
Ching Teo
|
Hal Daumé III
|
Yiannis Aloimonos
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
Search
Co-authors
- Yezhou Yang 2
- Cornelia Fermüller 2
- Ching Teo 1
- Hal Daumé Iii 1
- Chinmaya Devaraj 1
- show all...