Jianguo Mao


2022

pdf
Hierarchical Representation-based Dynamic Reasoning Network for Biomedical Question Answering
Jianguo Mao | Jiyuan Zhang | Zengfeng Zeng | Weihua Peng | Wenbin Jiang | Xiangdong Wang | Hong Liu | Yajuan Lyu
Proceedings of the 29th International Conference on Computational Linguistics

Recently, Biomedical Question Answering (BQA) has attracted growing attention due to its application value and technical challenges. Most existing works treat it as a semantic matching task that predicts answers by computing confidence among questions, options and evidence sentences, which is insufficient for scenarios that require complex reasoning based on a deep understanding of biomedical evidences. We propose a novel model termed Hierarchical Representation-based Dynamic Reasoning Network (HDRN) to tackle this problem. It first constructs the hierarchical representations for biomedical evidences to learn semantics within and among evidences. It then performs dynamic reasoning based on the hierarchical representations of evidences to solve complex biomedical problems. Against the existing state-of-the-art model, the proposed model significantly improves more than 4.5%, 3% and 1.3% on three mainstream BQA datasets, PubMedQA, MedQA-USMLE and NLPEC. The ablation study demonstrates the superiority of each improvement of our model. The code will be released after the paper is published.

pdf
Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering
Jianguo Mao | Wenbin Jiang | Xiangdong Wang | Zhifan Feng | Yajuan Lyu | Hong Liu | Yong Zhu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Existing video question answering (video QA) models lack the capacity for deep video understanding and flexible multistep reasoning. We propose for video QA a novel model which performs dynamic multistep reasoning between questions and videos. It creates video semantic representation based on the video scene graph composed of semantic elements of the video and semantic relations among these elements. Then, it performs multistep reasoning for better answer decision between the representations of the question and the video, and dynamically integrate the reasoning results. Experiments show the significant advantage of the proposed model against previous methods in accuracy and interpretability. Against the existing state-of-the-art model, the proposed model dramatically improves more than 4%/3.1%/2% on the three widely used video QA datasets, MSRVTT-QA, MSRVTT multi-choice, and TGIF-QA, and displays better interpretability by backtracing along with the attention mechanisms to the video scene graphs.