Abstract
In primary school, children’s books, as well as in modern language learning apps, multi-modal learning strategies like illustrations of terms and phrases are used to support reading comprehension. Also, several studies in educational psychology suggest that integrating cross-modal information will improve reading comprehension. We claim that state-of- he-art multi-modal transformers, which could be used in a language learner context to improve human reading, will perform poorly because of the short and relatively simple textual data those models are trained with. To prove our hypotheses, we collected a new multi-modal image-retrieval dataset based on data from Wikipedia. In an in-depth data analysis, we highlight the differences between our dataset and other popular datasets. Additionally, we evaluate several state-of-the-art multi-modal transformers on text-image retrieval on our dataset and analyze their meager results, which verify our claims.- Anthology ID:
- 2021.naacl-srw.21
- Volume:
- Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Esin Durmus, Vivek Gupta, Nelson Liu, Nanyun Peng, Yu Su
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- Language:
- URL:
- https://aclanthology.org/2021.naacl-srw.21
- DOI:
- Cite (ACL):
- Florian Schneider, Özge Alaçam, Xintong Wang, and Chris Biemann. 2021. Towards Multi-Modal Text-Image Retrieval to improve Human Reading. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, Online. Association for Computational Linguistics.
- Cite (Informal):
- Towards Multi-Modal Text-Image Retrieval to improve Human Reading (Schneider et al., NAACL 2021)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2021.naacl-srw.21.pdf
- Data
- Conceptual Captions, Flickr30k, MS COCO, WikiCaps