A Unified Framework for Multilingual and Code-Mixed Visual Question Answering
Deepak Gupta, Pabitra Lenka, Asif Ekbal, Pushpak Bhattacharyya
Abstract
In this paper, we propose an effective deep learning framework for multilingual and code- mixed visual question answering. The pro- posed model is capable of predicting answers from the questions in Hindi, English or Code- mixed (Hinglish: Hindi-English) languages. The majority of the existing techniques on Vi- sual Question Answering (VQA) focus on En- glish questions only. However, many applica- tions such as medical imaging, tourism, visual assistants require a multilinguality-enabled module for their widespread usages. As there is no available dataset in English-Hindi VQA, we firstly create Hindi and Code-mixed VQA datasets by exploiting the linguistic properties of these languages. We propose a robust tech- nique capable of handling the multilingual and code-mixed question to provide the answer against the visual information (image). To better encode the multilingual and code-mixed questions, we introduce a hierarchy of shared layers. We control the behaviour of these shared layers by an attention-based soft layer sharing mechanism, which learns how shared layers are applied in different ways for the dif- ferent languages of the question. Further, our model uses bi-linear attention with a residual connection to fuse the language and image fea- tures. We perform extensive evaluation and ablation studies for English, Hindi and Code- mixed VQA. The evaluation shows that the proposed multilingual model achieves state-of- the-art performance in all these settings.- Anthology ID:
- 2020.aacl-main.90
- Volume:
- Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
- Month:
- December
- Year:
- 2020
- Address:
- Suzhou, China
- Venue:
- AACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 900–913
- Language:
- URL:
- https://aclanthology.org/2020.aacl-main.90
- DOI:
- Cite (ACL):
- Deepak Gupta, Pabitra Lenka, Asif Ekbal, and Pushpak Bhattacharyya. 2020. A Unified Framework for Multilingual and Code-Mixed Visual Question Answering. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 900–913, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- A Unified Framework for Multilingual and Code-Mixed Visual Question Answering (Gupta et al., AACL 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.aacl-main.90.pdf
- Data
- MCVQA, COCO, Visual Question Answering