Debarshi Kumar Sanyal

Also published as: Debarshi Kumar Sanyal


2022

pdf
Named Entity Recognition Based Automatic Generation of Research Highlights
Tohida Rehman | Debarshi Kumar Sanyal | Prasenjit Majumder | Samiran Chattopadhyay
Proceedings of the Third Workshop on Scholarly Document Processing

A scientific paper is traditionally prefaced by an abstract that summarizes the paper. Recently, research highlights that focus on the main findings of the paper have emerged as a complementary summary in addition to an abstract. However, highlights are not yet as common as abstracts, and are absent in many papers. In this paper, we aim to automatically generate research highlights using different sections of a research paper as input. We investigate whether the use of named entity recognition on the input improves the quality of the generated highlights. In particular, we have used two deep learning-based models: the first is a pointer-generator network, and the second augments the first model with coverage mechanism. We then augment each of the above models with named entity recognition features. The proposed method can be used to produce highlights for papers with missing highlights. Our experiments show that adding named entity information improves the performance of the deep learning-based summarizers in terms of ROUGE, METEOR and BERTScore measures.

pdf
What Does the Indian Parliament Discuss? An Exploratory Analysis of the Question Hour in the Lok Sabha
Suman Adhya | Debarshi Kumar Sanyal
Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences

The TCPD-IPD dataset is a collection of questions and answers discussed in the Lower House of the Parliament of India during the Question Hour between 1999 and 2019. Although it is difficult to analyze such a huge collection manually, modern text analysis tools can provide a powerful means to navigate it. In this paper, we perform an exploratory analysis of the dataset. In particular, we present insightful corpus-level statistics and perform a more detailed analysis of three subsets of the dataset. In the latter analysis, the focus is on understanding the temporal evolution of topics using a dynamic topic model. We observe that the parliamentary conversation indeed mirrors the political and socio-economic tensions of each period.

2020

pdf
SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers
Santosh Tokala | Debarshi Kumar Sanyal | Plaban Kumar Bhowmick | Partha Pratim Das
Proceedings of the 28th International Conference on Computational Linguistics

Keyphrases in a research paper succinctly capture the primary content of the paper and also assist in indexing the paper at a concept level. Given the huge rate at which scientific papers are published today, it is important to have effective ways of automatically extracting keyphrases from a research paper. In this paper, we present a novel method, Syntax and Semantics Aware Keyphrase Extraction (SaSAKE), to extract keyphrases from research papers. It uses a transformer architecture, stacking up sentence encoders to incorporate sequential information, and graph encoders to incorporate syntactic and semantic dependency graph information. Incorporation of these dependency graphs helps to alleviate long-range dependency problems and identify the boundaries of multi-word keyphrases effectively. Experimental results on three benchmark datasets show that our proposed method SaSAKE achieves state-of-the-art performance in keyphrase extraction from scientific papers.