Riyaz Bhat


2023

pdf
PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development
Avi Sil | Jaydeep Sen | Bhavani Iyer | Martin Franz | Kshitij Fadnis | Mihaela Bornea | Sara Rosenthal | Scott McCarley | Rong Zhang | Vishwajeet Kumar | Yulong Li | Md Arafat Sultan | Riyaz Bhat | Juergen Bross | Radu Florian | Salim Roukos
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PrimeQA: a one-stop and open-source QA repository with an aim to democratize QA research and facilitate easy replication of state-of-the-art (SOTA) QA methods. PrimeQA supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation. It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on public benchmarks, and expanding pre-existing methods. PrimeQA is available at: https://github.com/primeqa.

2022

pdf
DocInfer: Document-level Natural Language Inference using Optimal Evidence Selection
Puneet Mathur | Gautam Kunapuli | Riyaz Bhat | Manish Shrivastava | Dinesh Manocha | Maneesh Singh
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

We present DocInfer - a novel, end-to-end Document-level Natural Language Inference model that builds a hierarchical document graph enriched through inter-sentence relations (topical, entity-based, concept-based), performs paragraph pruning using the novel SubGraph Pooling layer, followed by optimal evidence selection based on REINFORCE algorithm to identify the most important context sentences for a given hypothesis. Our evidence selection mechanism allows it to transcend the input length limitation of modern BERT-like Transformer models while presenting the entire evidence together for inferential reasoning. We show this is an important property needed to reason on large documents where the evidence may be fragmented and located arbitrarily far from each other. Extensive experiments on popular corpora - DocNLI, ContractNLI, and ConTRoL datasets, and our new proposed dataset called CaseHoldNLI on the task of legal judicial reasoning, demonstrate significant performance gains of 8-12% over SOTA methods. Our ablation studies validate the impact of our model. Performance improvement of 3-6% on annotation-scarce downstream tasks of fact verification, multiple-choice QA, and contract clause retrieval demonstrates the usefulness of DocInfer beyond primary NLI tasks.