LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering

Wan Zhang, Chen Keming, Zhang Yujie, Xu Jinan, Chen Yufeng


Abstract
The predominant approach of visual question answering (VQA) relies on encoding the imageand question with a ”black box” neural encoder and decoding a single token into answers suchas ”yes” or ”no”. Despite this approach’s strong quantitative results it struggles to come up withhuman-readable forms of justification for the prediction process. To address this insufficiency we propose LRRA[LookReadReasoningAnswer]a transparent neural-symbolic framework forvisual question answering that solves the complicated problem in the real world step-by-steplike humans and provides human-readable form of justification at each step.Specifically LRRAlearns to first convert an image into a scene graph and parse a question into multiple reasoning instructions. It then executes the reasoning instructions one at a time by traversing the scenegraph using a recurrent neural-symbolic execution module.Finally it generates answers to the given questions and makes corresponding marks on the image. Furthermore we believe that the relations between objects in the question is of great significance for obtaining the correct answerso we create a perturbed GQA test set by removing linguistic cues (attributes and relations) in the questions to analyze which part of the question contributes more to the answer.Our experimentson the GQA dataset show that LRRA is significantly better than the existing representative model(57.12% vs. 56.39%). Our experiments on the perturbed GQA test set show that the relations between objects is more important for answering complicated questions than the attributes ofobjects.Keywords:Visual Question Answering Relations Between Objects Neural-Symbolic Reason-ing.
Anthology ID:
2021.ccl-1.92
Volume:
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:
August
Year:
2021
Address:
Huhhot, China
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1037–1045
Language:
English
URL:
https://aclanthology.org/2021.ccl-1.92
DOI:
Bibkey:
Cite (ACL):
Wan Zhang, Chen Keming, Zhang Yujie, Xu Jinan, and Chen Yufeng. 2021. LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1037–1045, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):
LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering (Zhang et al., CCL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.ccl-1.92.pdf
Data
CLEVRGQAVisual Question Answering