Sumit Shekhar


2023

pdf
A Neural CRF-based Hierarchical Approach for Linear Text Segmentation
Inderjeet Nair | Aparna Garimella | Balaji Vasan Srinivasan | Natwar Modani | Niyati Chhaya | Srikrishna Karanam | Sumit Shekhar
Findings of the Association for Computational Linguistics: EACL 2023

We consider the problem of segmenting unformatted text and transcripts linearly based on their topical structure. While prior approaches explicitly train to predict segment boundaries, our proposed approach solves this task by inferring the hierarchical segmentation structure associated with the input text fragment. Given the lack of a large annotated dataset for this task, we propose a data curation strategy and create a corpus of over 700K Wikipedia articles with their hierarchical structures. We then propose the first supervised approach to generating hierarchical segmentation structures based on these annotations. Our method, in particular, is based on a neural conditional random field (CRF), which explicitly models the statistical dependency between a node and its constituent child nodes. We introduce a new data augmentation scheme as part of our model training strategy, which involves sampling a variety of node aggregations, permutations, and removals, all of which help capture fine-grained and coarse topical shifts in the data and improve model performance. Extensive experiments show that our model outperforms or achieves competitive performance when compared to previous state-of-the-art algorithms in the following settings: rich-resource, cross-domain transferability, few-shot supervision, and segmentation when topic label annotations are provided.

2022

pdf
DynamicTOC: Persona-based Table of Contents for Consumption of Long Documents
Himanshu Maheshwari | Nethraa Sivakumar | Shelly Jain | Tanvi Karandikar | Vinay Aggarwal | Navita Goyal | Sumit Shekhar
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Long documents like contracts, financial documents, etc., are often tedious to read through. Linearly consuming (via scrolling or navigation through default table of content) these documents is time-consuming and challenging. These documents are also authored to be consumed by varied entities (referred to as persona in the paper) interested in only certain parts of the document. In this work, we describe DynamicToC, a dynamic table of content-based navigator, to aid in the task of non-linear, persona-based document consumption. DynamicToC highlights sections of interest in the document as per the aspects relevant to different personas. DynamicToC is augmented with short questions to assist the users in understanding underlying content. This uses a novel deep-reinforcement learning technique to generate questions on these persona-clustered paragraphs. Human and automatic evaluations suggest the efficacy of both end-to-end pipeline and different components of DynamicToC.

2020

pdf
STL-CQA: Structure-based Transformers with Localization and Encoding for Chart Question Answering
Hrituraj Singh | Sumit Shekhar
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Chart Question Answering (CQA) is the task of answering natural language questions about visualisations in the chart image. Recent solutions, inspired by VQA approaches, rely on image-based attention for question/answering while ignoring the inherent chart structure. We propose STL-CQA which improves the question/answering through sequential elements localization, question encoding and then, a structural transformer-based learning approach. We conduct extensive experiments while proposing pre-training tasks, methodology and also an improved dataset with more complex and balanced questions of different types. The proposed methodology shows a significant accuracy improvement compared to the state-of-the-art approaches on various chart Q/A datasets, while outperforming even human baseline on the DVQA Dataset. We also demonstrate interpretability while examining different components in the inference pipeline.