Jaydeep Sen


2022

pdf
AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry
Yannis Katsis | Saneem Chemmengath | Vishwajeet Kumar | Samarth Bharadwaj | Mustafa Canim | Michael Glass | Alfio Gliozzo | Feifei Pan | Jaydeep Sen | Karthik Sankaranarayanan | Soumen Chakrabarti
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track

Table Question Answering (Table QA) systems have been shown to be highly accurate when trained and tested on open-domain datasets built on top of Wikipedia tables. However, it is not clear whether their performance remains the same when applied to domain-specific scientific and business documents, encountered in industrial settings, which exhibit some unique characteristics: (a) they contain tables with a much more complex layout than Wikipedia tables (including hierarchical row and column headers), (b) they contain domain-specific terms, and (c) they are typically not accompanied by domain-specific labeled data that can be used to train Table QA models.To understand the performance of Table QA approaches in this setting, we introduce AIT-QA; a domain-specific Table QA test dataset. While focusing on the airline industry, AIT-QA reflects the challenges that domain-specific documents pose to Table QA, outlined above. In this work, we describe the creation of the dataset and report zero-shot experimental results of three SOTA Table QA methods. The results clearly expose the limitations of current methods with a best accuracy of just 51.8%. We also present pragmatic table pre-processing steps to pivot and project complex tables into a layout suitable for the SOTA Table QA models. Finally, we provide data-driven insights on how different aspects of this setting (including hierarchical headers, domain-specific terminology, and paraphrasing) affect Table QA methods, in order to help the community develop improved methods for domain-specific Table QA.

2021

pdf
Topic Transferable Table Question Answering
Saneem Chemmengath | Vishwajeet Kumar | Samarth Bharadwaj | Jaydeep Sen | Mustafa Canim | Soumen Chakrabarti | Alfio Gliozzo | Karthik Sankaranarayanan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Weakly-supervised table question-answering (TableQA) models have achieved state-of-art performance by using pre-trained BERT transformer to jointly encoding a question and a table to produce structured query for the question. However, in practical settings TableQA systems are deployed over table corpora having topic and word distributions quite distinct from BERT’s pretraining corpus. In this work we simulate the practical topic shift scenario by designing novel challenge benchmarks WikiSQL-TS and WikiTable-TS, consisting of train-dev-test splits in five distinct topic groups, based on the popular WikiSQL and WikiTable-Questions datasets. We empirically show that, despite pre-training on large open-domain text, performance of models degrades significantly when they are evaluated on unseen topics. In response, we propose T3QA (Topic Transferable Table Question Answering) a pragmatic adaptation framework for TableQA comprising of: (1) topic-specific vocabulary injection into BERT, (2) a novel text-to-text transformer generator (such as T5, GPT2) based natural language question generation pipeline focused on generating topic-specific training data, and (3) a logical form re-ranker. We show that T3QA provides a reasonably good baseline for our topic shift benchmarks. We believe our topic split benchmarks will lead to robust TableQA solutions that are better suited for practical deployment

2020

pdf
Schema Aware Semantic Reasoning for Interpreting Natural Language Queries in Enterprise Settings
Jaydeep Sen | Tanaya Babtiwale | Kanishk Saxena | Yash Butala | Sumit Bhatia | Karthik Sankaranarayanan
Proceedings of the 28th International Conference on Computational Linguistics

Natural Language Query interfaces allow the end-users to access the desired information without the need to know any specialized query language, data storage, or schema details. Even with the recent advances in NLP research space, the state-of-the-art QA systems fall short of understanding implicit intents of real-world Business Intelligence (BI) queries in enterprise systems, since Natural Language Understanding still remains an AI-hard problem. We posit that deploying ontology reasoning over domain semantics can help in achieving better natural language understanding for QA systems. In this paper, we specifically focus on building a Schema Aware Semantic Reasoning Framework that translates natural language interpretation as a sequence of solvable tasks by an ontology reasoner. We apply our framework on top of an ontology based, state-of-the-art natural language question-answering system ATHENA, and experiment with 4 benchmarks focused on BI queries. Our experimental numbers empirically show that the Schema Aware Semantic Reasoning indeed helps in achieving significantly better results for handling BI queries with an average accuracy improvement of ~30%