Sujatha Das Gollapalli

Also published as: Sujatha Das, Sujatha Das Gollapalli


2021

pdf bib
NUS-IDS at CASE 2021 Task 1: Improving Multilingual Event Sentence Coreference Identification With Linguistic Information
Fiona Anting Tan | Sujatha Das Gollapalli | See-Kiong Ng
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

Event Sentence Coreference Identification (ESCI) aims to cluster event sentences that refer to the same event together for information extraction. We describe our ESCI solution developed for the ACL-CASE 2021 shared tasks on the detection and classification of socio-political and crisis event information in a multilingual setting. For a given article, our proposed pipeline comprises of an accurate sentence pair classifier that identifies coreferent sentence pairs and subsequently uses these predicted probabilities to cluster sentences into groups. Sentence pair representations are constructed from fine-tuned BERT embeddings plus POS embeddings fed through a BiLSTM model, and combined with linguistic-based lexical and semantic similarities between sentences. Our best models ranked 2nd, 1st and 2nd and obtained CoNLL F1 scores of 81.20%, 93.03%, 83.15% for the English, Portuguese and Spanish test sets respectively in the ACL-CASE 2021 competition.

pdf bib
On Generating Fact-Infused Question Variations
Arthur Deschamps | Sujatha Das Gollapalli | See-Kiong Ng
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

To fully model human-like ability to ask questions, automatic question generation (QG) models must be able to produce multiple expressions of the same question with different levels of detail. Unfortunately, existing datasets available for learning QG do not include paraphrases or question variations affecting a model’s ability to learn this capability. We present FIRS, a dataset containing human-generated fact-infused rewrites of questions from the widely-used SQuAD dataset to address this limitation. Questions in FIRS were obtained by combining a given question with facts of entities referenced in the question. We study a double encoder-decoder model, Fact-Infused Question Generator (FIQG), for learning to generate fact-infused questions from a given question. Experimental results show that FIQG effectively incorporates information from facts to add more detail to a given question. To the best of our knowledge, ours is the first study to present fact-infusion as a novel form of question paraphrasing.

pdf bib
Suicide Risk Prediction by Tracking Self-Harm Aspects in Tweets: NUS-IDS at the CLPsych 2021 Shared Task
Sujatha Das Gollapalli | Guilherme Augusto Zagatti | See-Kiong Ng
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access

We describe our system for identifying users at-risk for suicide based on their tweets developed for the CLPsych 2021 Shared Task. Based on research in mental health studies linking self-harm tendencies with suicide, in our system, we attempt to characterize self-harm aspects expressed in user tweets over a period of time. To this end, we design SHTM, a Self-Harm Topic Model that combines Latent Dirichlet Allocation with a self-harm dictionary for modeling daily tweets of users. Next, differences in moods and topics over time are captured as features to train a deep learning model for suicide prediction.

2020

pdf bib
ESTeR: Combining Word Co-occurrences and Word Associations for Unsupervised Emotion Detection
Sujatha Das Gollapalli | Polina Rozenshtein | See-Kiong Ng
Findings of the Association for Computational Linguistics: EMNLP 2020

Accurate detection of emotions in user- generated text was shown to have several applications for e-commerce, public well-being, and disaster management. Currently, the state-of-the-art performance for emotion detection in text is obtained using complex, deep learning models trained on domain-specific, labeled data. In this paper, we propose ESTeR , an unsupervised model for identifying emotions using a novel similarity function based on random walks on graphs. Our model combines large-scale word co-occurrence information with word-associations from lexicons avoiding not only the dependence on labeled datasets, but also an explicit mapping of words to latent spaces used in emotion-enriched word embeddings. Our similarity function can also be computed efficiently. We study a range of datasets including recent tweets related to COVID-19 to illustrate the superior performance of our model and report insights on public emotions during the on-going pandemic.

pdf bib
On the Use of Web Search to Improve Scientific Collections
Krutarth Patel | Cornelia Caragea | Sujatha Das Gollapalli
Proceedings of the First Workshop on Scholarly Document Processing

Despite the advancements in search engine features, ranking methods, technologies, and the availability of programmable APIs, current-day open-access digital libraries still rely on crawl-based approaches for acquiring their underlying document collections. In this paper, we propose a novel search-driven framework for acquiring documents for such scientific portals. Within our framework, publicly-available research paper titles and author names are used as queries to a Web search engine. We were able to obtain ~267,000 unique research papers through our fully-automated framework using ~76,000 queries, resulting in almost 200,000 more papers than the number of queries. Moreover, through a combination of title and author name search, we were able to recover 78% of the original searched titles.

2015

pdf bib
Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction
Sujatha Das Gollapalli | Cornelia Caragea | Xiaoli Li | C. Lee Giles
Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction

pdf bib
EMNLP versus ACL: Analyzing NLP research over time
Sujatha Das Gollapalli | Xiaoli Li
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach
Cornelia Caragea | Florin Adrian Bulgarov | Andreea Godea | Sujatha Das Gollapalli
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2005

pdf bib
Enhanced Answer Type Inference from Questions using Sequential Models
Vijay Krishnan | Sujatha Das | Soumen Chakrabarti
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing