2022
pdf
abs
SYGMA: A System for Generalizable and Modular Question Answering Over Knowledge Bases
Sumit Neelam
|
Udit Sharma
|
Hima Karanam
|
Shajith Ikbal
|
Pavan Kapanipathi
|
Ibrahim Abdelaziz
|
Nandana Mihindukulasooriya
|
Young-Suk Lee
|
Santosh Srivastava
|
Cezar Pendus
|
Saswati Dana
|
Dinesh Garg
|
Achille Fokoue
|
G P Shrivatsa Bhargav
|
Dinesh Khandelwal
|
Srinivas Ravishankar
|
Sairam Gurajada
|
Maria Chang
|
Rosario Uceda-Sosa
|
Salim Roukos
|
Alexander Gray
|
Guilherme Lima
|
Ryan Riegel
|
Francois Luus
|
L V Subramaniam
Findings of the Association for Computational Linguistics: EMNLP 2022
Knowledge Base Question Answering (KBQA) involving complex reasoning is emerging as an important research direction. However, most KBQA systems struggle with generalizability, particularly on two dimensions: (a) across multiple knowledge bases, where existing KBQA approaches are typically tuned to a single knowledge base, and (b) across multiple reasoning types, where majority of datasets and systems have primarily focused on multi-hop reasoning. In this paper, we present SYGMA, a modular KBQA approach developed with goal of generalization across multiple knowledge bases and multiple reasoning types. To facilitate this, SYGMA is designed as two high level modules: 1) KB-agnostic question understanding module that remain common across KBs, and generates logic representation of the question with high level reasoning constructs that are extensible, and 2) KB-specific question mapping and answering module to address the KB-specific aspects of the answer extraction. We evaluated SYGMA on multiple datasets belonging to distinct knowledge bases (DBpedia and Wikidata) and distinct reasoning types (multi-hop and temporal). State-of-the-art or competitive performances achieved on those datasets demonstrate its generalization capability.
pdf
abs
A Two-Stage Approach towards Generalization in Knowledge Base Question Answering
Srinivas Ravishankar
|
Dung Thai
|
Ibrahim Abdelaziz
|
Nandana Mihindukulasooriya
|
Tahira Naseem
|
Pavan Kapanipathi
|
Gaetano Rossiello
|
Achille Fokoue
Findings of the Association for Computational Linguistics: EMNLP 2022
Most existing approaches for Knowledge Base Question Answering (KBQA) focus on a specific underlying knowledge base either because of inherent assumptions in the approach, or because evaluating it on a different knowledge base requires non-trivial changes. However, many popular knowledge bases share similarities in their underlying schemas that can be leveraged to facilitate generalization across knowledge bases. To achieve this generalization, we introduce a KBQA framework based on a 2-stage architecture that explicitly separates semantic parsing from the knowledge base interaction, facilitating transfer learning across datasets and knowledge graphs. We show that pretraining on datasets with a different underlying knowledge base can nevertheless provide significant performance gains and reduce sample complexity. Our approach achieves comparable or state-of-the-art performance for LC-QuAD (DBpedia), WebQSP (Freebase), SimpleQuestions (Wikidata) and MetaQA (Wikimovies-KG).
2021
pdf
Leveraging Abstract Meaning Representation for Knowledge Base Question Answering
Pavan Kapanipathi
|
Ibrahim Abdelaziz
|
Srinivas Ravishankar
|
Salim Roukos
|
Alexander Gray
|
Ramón Fernandez Astudillo
|
Maria Chang
|
Cristina Cornelio
|
Saswati Dana
|
Achille Fokoue
|
Dinesh Garg
|
Alfio Gliozzo
|
Sairam Gurajada
|
Hima Karanam
|
Naweed Khan
|
Dinesh Khandelwal
|
Young-Suk Lee
|
Yunyao Li
|
Francois Luus
|
Ndivhuwo Makondo
|
Nandana Mihindukulasooriya
|
Tahira Naseem
|
Sumit Neelam
|
Lucian Popa
|
Revanth Gangi Reddy
|
Ryan Riegel
|
Gaetano Rossiello
|
Udit Sharma
|
G P Shrivatsa Bhargav
|
Mo Yu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
2018
pdf
abs
An Interface for Annotating Science Questions
Michael Boratko
|
Harshit Padigela
|
Divyendra Mikkilineni
|
Pritish Yuvraj
|
Rajarshi Das
|
Andrew McCallum
|
Maria Chang
|
Achille Fokoue
|
Pavan Kapanipathi
|
Nicholas Mattei
|
Ryan Musa
|
Kartik Talamadupula
|
Michael Witbrock
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Recent work introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set. That work includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them. However, it does not include clear definitions of these types, nor does it offer information about the quality of the labels or the annotation process used. In this paper, we introduce a novel interface for human annotation of science question-answer pairs with their respective knowledge and reasoning types, in order that the classification of new questions may be improved. We build on the classification schema proposed by prior work on the ARC dataset, and evaluate the effectiveness of our interface with a preliminary study involving 10 participants.
pdf
abs
A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset
Michael Boratko
|
Harshit Padigela
|
Divyendra Mikkilineni
|
Pritish Yuvraj
|
Rajarshi Das
|
Andrew McCallum
|
Maria Chang
|
Achille Fokoue-Nkoutche
|
Pavan Kapanipathi
|
Nicholas Mattei
|
Ryan Musa
|
Kartik Talamadupula
|
Michael Witbrock
Proceedings of the Workshop on Machine Reading for Question Answering
The recent work of Clark et al. (2018) introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into easy and challenge sets. That paper includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them; however, it does not include clear definitions of these types, nor does it offer information about the quality of the labels. We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset. Using ten annotators and a sophisticated annotation interface, we analyze the distribution of labels across the challenge set and statistics related to them. Additionally, we demonstrate that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus. Evaluating with human-selected relevant sentences improves the performance of a neural machine comprehension model by 42 points.