Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension

Daesik Kim, Seonhoon Kim, Nojun Kwak


Abstract
In this work, we introduce a novel algorithm for solving the textbook question answering (TQA) task which describes more realistic QA problems compared to other recent tasks. We mainly focus on two related issues with analysis of the TQA dataset. First, solving the TQA problems requires to comprehend multi-modal contexts in complicated input data. To tackle this issue of extracting knowledge features from long text lessons and merging them with visual features, we establish a context graph from texts and images, and propose a new module f-GCN based on graph convolutional networks (GCN). Second, scientific terms are not spread over the chapters and subjects are split in the TQA dataset. To overcome this so called ‘out-of-domain’ issue, before learning QA problems, we introduce a novel self-supervised open-set learning process without any annotations. The experimental results show that our model significantly outperforms prior state-of-the-art methods. Moreover, ablation studies validate that both methods of incorporating f-GCN for extracting knowledge from multi-modal contexts and our newly proposed self-supervised learning process are effective for TQA problems.
Anthology ID:
P19-1347
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3568–3584
Language:
URL:
https://aclanthology.org/P19-1347
DOI:
10.18653/v1/P19-1347
Bibkey:
Cite (ACL):
Daesik Kim, Seonhoon Kim, and Nojun Kwak. 2019. Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3568–3584, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension (Kim et al., ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/P19-1347.pdf
Video:
 https://preview.aclanthology.org/ingest-2024-clasp/P19-1347.mp4
Data
SQuADTQA