Cross-linguistic differences and similarities in image descriptions

Emiel van Miltenburg, Desmond Elliott, Piek Vossen


Abstract
Automatic image description systems are commonly trained and evaluated on large image description datasets. Recently, researchers have started to collect such datasets for languages other than English. An unexplored question is how different these datasets are from English and, if there are any differences, what causes them to differ. This paper provides a cross-linguistic comparison of Dutch, English, and German image descriptions. We find that these descriptions are similar in many respects, but the familiarity of crowd workers with the subjects of the images has a noticeable influence on the specificity of the descriptions.
Anthology ID:
W17-3503
Volume:
Proceedings of the 10th International Conference on Natural Language Generation
Month:
September
Year:
2017
Address:
Santiago de Compostela, Spain
Editors:
Jose M. Alonso, Alberto Bugarín, Ehud Reiter
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
21–30
Language:
URL:
https://aclanthology.org/W17-3503
DOI:
10.18653/v1/W17-3503
Bibkey:
Cite (ACL):
Emiel van Miltenburg, Desmond Elliott, and Piek Vossen. 2017. Cross-linguistic differences and similarities in image descriptions. In Proceedings of the 10th International Conference on Natural Language Generation, pages 21–30, Santiago de Compostela, Spain. Association for Computational Linguistics.
Cite (Informal):
Cross-linguistic differences and similarities in image descriptions (van Miltenburg et al., INLG 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/W17-3503.pdf
Code
 cltl/DutchDescriptions
Data
Flickr30kMS COCOMulti30K