An Evaluation of Image-Based Verb Prediction Models against Human Eye-Tracking Data

Spandana Gella, Frank Keller


Abstract
Recent research in language and vision has developed models for predicting and disambiguating verbs from images. Here, we ask whether the predictions made by such models correspond to human intuitions about visual verbs. We show that the image regions a verb prediction model identifies as salient for a given verb correlate with the regions fixated by human observers performing a verb classification task.
Anthology ID:
N18-2119
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
758–763
Language:
URL:
https://aclanthology.org/N18-2119
DOI:
10.18653/v1/N18-2119
Bibkey:
Cite (ACL):
Spandana Gella and Frank Keller. 2018. An Evaluation of Image-Based Verb Prediction Models against Human Eye-Tracking Data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 758–763, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
An Evaluation of Image-Based Verb Prediction Models against Human Eye-Tracking Data (Gella & Keller, NAACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/N18-2119.pdf
Data
SALICONVQA-HAT