Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa


Abstract
A multi-hop question answering (QA) dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. However, current datasets do not provide a complete explanation for the reasoning process from the question to the answer. Further, previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. In this study, we present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data. In our dataset, we introduce the evidence information containing a reasoning path for multi-hop questions. The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model. We carefully design a pipeline and a set of templates when generating a question-answer pair that guarantees the multi-hop steps and the quality of the questions. We also exploit the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning. Through experiments, we demonstrate that our dataset is challenging for multi-hop models and it ensures that multi-hop reasoning is required.
Anthology ID:
2020.coling-main.580
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
6609–6625
Language:
URL:
https://aclanthology.org/2020.coling-main.580
DOI:
10.18653/v1/2020.coling-main.580
Bibkey:
Cite (ACL):
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6609–6625, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps (Ho et al., COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.580.pdf
Code
 Alab-NII/2wikimultihop
Data
2WikiMultiHopQAComplexWebQuestionsHotpotQA