Abstract
Visual Question Answering (VQA) has recently become a popular research area. VQA problem lies in the boundary of Computer Vision and Natural Language Processing research domains. In VQA research, the dataset is a very important aspect because of its variety in image types i.e. natural and synthetic and also question answer source i.e. originated from human source or computer-generated question answer. Various details about each dataset is given in this paper, which can help future researchers to a great extent. In this paper, we discussed and compared the experimental performance of Stacked Attention Network Model (SANM) and bidirectional LSTM and MUTAN based fusion models. As per the experimental results, MUTAN accuracy and loss are 29% and 3.5 respectively. SANM model is giving 55% accuracy and a loss of 2.2 whereas VQA model is giving 59% accuracy and 1.9 loss.- Anthology ID:
- 2021.icon-main.67
- Volume:
- Proceedings of the 18th International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2021
- Address:
- National Institute of Technology Silchar, Silchar, India
- Editors:
- Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
- Venue:
- ICON
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 550–554
- Language:
- URL:
- https://aclanthology.org/2021.icon-main.67
- DOI:
- Cite (ACL):
- Souvik Chowdhury and Badal Soni. 2021. eaVQA: An Experimental Analysis on Visual Question Answering Models. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 550–554, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
- Cite (Informal):
- eaVQA: An Experimental Analysis on Visual Question Answering Models (Chowdhury & Soni, ICON 2021)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/2021.icon-main.67.pdf
- Data
- Visual Question Answering