ConceptBert: Concept-Aware Representation for Visual Question Answering
François Gardères, Maryam Ziaeefard, Baptiste Abeloos, Freddy Lecue
Abstract
Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. A VQA model combines visual and textual features in order to answer questions grounded in an image. Current works in VQA focus on questions which are answerable by direct analysis of the question and image alone. We present a concept-aware algorithm, ConceptBert, for questions which require common sense, or basic factual knowledge from external structured content. Given an image and a question in natural language, ConceptBert requires visual elements of the image and a Knowledge Graph (KG) to infer the correct answer. We introduce a multi-modal representation which learns a joint Concept-Vision-Language embedding inspired by the popular BERT architecture. We exploit ConceptNet KG for encoding the common sense knowledge and evaluate our methodology on the Outside Knowledge-VQA (OK-VQA) and VQA datasets.- Anthology ID:
- 2020.findings-emnlp.44
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Trevor Cohn, Yulan He, Yang Liu
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 489–498
- Language:
- URL:
- https://aclanthology.org/2020.findings-emnlp.44
- DOI:
- 10.18653/v1/2020.findings-emnlp.44
- Cite (ACL):
- François Gardères, Maryam Ziaeefard, Baptiste Abeloos, and Freddy Lecue. 2020. ConceptBert: Concept-Aware Representation for Visual Question Answering. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 489–498, Online. Association for Computational Linguistics.
- Cite (Informal):
- ConceptBert: Concept-Aware Representation for Visual Question Answering (Gardères et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2020.findings-emnlp.44.pdf
- Code
- ThalesGroup/ConceptBERT + additional community code
- Data
- ConceptNet, OK-VQA, Visual Question Answering