Towards Multi-Modal Text-Image Retrieval to improve Human Reading

Florian Schneider; Özge Alaçam; Xintong Wang; Chris Biemann

Towards Multi-Modal Text-Image Retrieval to improve Human Reading

Florian Schneider, Özge Alaçam, Xintong Wang, Chris Biemann

Abstract

In primary school, children’s books, as well as in modern language learning apps, multi-modal learning strategies like illustrations of terms and phrases are used to support reading comprehension. Also, several studies in educational psychology suggest that integrating cross-modal information will improve reading comprehension. We claim that state-of- he-art multi-modal transformers, which could be used in a language learner context to improve human reading, will perform poorly because of the short and relatively simple textual data those models are trained with. To prove our hypotheses, we collected a new multi-modal image-retrieval dataset based on data from Wikipedia. In an in-depth data analysis, we highlight the differences between our dataset and other popular datasets. Additionally, we evaluate several state-of-the-art multi-modal transformers on text-image retrieval on our dataset and analyze their meager results, which verify our claims.

Anthology ID:: 2021.naacl-srw.21
Volume:: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:: June
Year:: 2021
Address:: Online
Editors:: Esin Durmus, Vivek Gupta, Nelson Liu, Nanyun Peng, Yu Su
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:
Language:
URL:: https://aclanthology.org/2021.naacl-srw.21
DOI:
Bibkey:
Cite (ACL):: Florian Schneider, Özge Alaçam, Xintong Wang, and Chris Biemann. 2021. Towards Multi-Modal Text-Image Retrieval to improve Human Reading. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, Online. Association for Computational Linguistics.
Cite (Informal):: Towards Multi-Modal Text-Image Retrieval to improve Human Reading (Schneider et al., NAACL 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-dup-bibkey/2021.naacl-srw.21.pdf
Video:: https://preview.aclanthology.org/fix-dup-bibkey/2021.naacl-srw.21.mp4
Data: Conceptual Captions, Flickr30k, MS COCO, WikiCaps

PDF Search Video