Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps

Nobuyuki Shimizu; Na Rong; Takashi Miyazaki

Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps

Nobuyuki Shimizu, Na Rong, Takashi Miyazaki

Abstract

Visual question answering (VQA) is a challenging task that requires a computer system to understand both a question and an image. While there is much research on VQA in English, there is a lack of datasets for other languages, and English annotation is not directly applicable in those languages. To deal with this, we have created a Japanese VQA dataset by using crowdsourced annotation with images from the Visual Genome dataset. This is the first such dataset in Japanese. As another contribution, we propose a cross-lingual method for making use of English annotation to improve a Japanese VQA system. The proposed method is based on a popular VQA method that uses an attention mechanism. We use attention maps generated from English questions to help improve the Japanese VQA task. The proposed method experimentally performed better than simply using a monolingual corpus, which demonstrates the effectiveness of using attention maps to transfer cross-lingual information.

Anthology ID:: C18-1163
Volume:: Proceedings of the 27th International Conference on Computational Linguistics
Month:: August
Year:: 2018
Address:: Santa Fe, New Mexico, USA
Editors:: Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1918–1928
Language:
URL:: https://aclanthology.org/C18-1163
DOI:
Bibkey:
Cite (ACL):: Nobuyuki Shimizu, Na Rong, and Takashi Miyazaki. 2018. Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1918–1928, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps (Shimizu et al., COLING 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/proper-vol2-ingestion/C18-1163.pdf
Data: COCO-QA, DAQUAR, MS COCO, Visual Genome, Visual Question Answering, Visual7W

PDF Search