Krishnapriya Vishnubhotla


2022

pdf
Tweet Emotion Dynamics: Emotion Word Usage in Tweets from US and Canada
Krishnapriya Vishnubhotla | Saif M. Mohammad
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Over the last decade, Twitter has emerged as one of the most influential forums for social, political, and health discourse. In this paper, we introduce a massive dataset of more than 45 million geo-located tweets posted between 2015 and 2021 from US and Canada (TUSC), especially curated for natural language analysis. We also introduce Tweet Emotion Dynamics (TED) — metrics to capture patterns of emotions associated with tweets over time. We use TED and TUSC to explore the use of emotion-associated words across US and Canada; across 2019 (pre-pandemic), 2020 (the year the pandemic hit), and 2021 (the second year of the pandemic); and across individual tweeters. We show that Canadian tweets tend to have higher valence, lower arousal, and higher dominance than the US tweets. Further, we show that the COVID-19 pandemic had a marked impact on the emotional signature of tweets posted in 2020, when compared to the adjoining years. Finally, we determine metrics of TED for 170,000 tweeters to benchmark characteristics of TED metrics at an aggregate level. TUSC and the metrics for TED will enable a wide variety of research on studying how we use language to express ourselves, persuade, communicate, and influence, with particularly promising applications in public health, affective science, social science, and psychology.

pdf
The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts
Krishnapriya Vishnubhotla | Adam Hammond | Graeme Hirst
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present the Project Dialogism Novel Corpus, or PDNC, an annotated dataset of quotations for English literary texts. PDNC contains annotations for 35,978 quotations across 22 full-length novels, and is by an order of magnitude the largest corpus of its kind. Each quotation is annotated for the speaker, addressees, type of quotation, referring expression, and character mentions within the quotation text. The annotated attributes allow for a comprehensive evaluation of models of quotation attribution and coreference for literary texts.

2021

pdf
An Evaluation of Disentangled Representation Learning for Texts
Krishnapriya Vishnubhotla | Graeme Hirst | Frank Rudzicz
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2019

pdf
Are Fictional Voices Distinguishable? Classifying Character Voices in Modern Drama
Krishnapriya Vishnubhotla | Adam Hammond | Graeme Hirst
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

According to the literary theory of Mikhail Bakhtin, a dialogic novel is one in which characters speak in their own distinct voices, rather than serving as mouthpieces for their authors. We use text classification to determine which authors best achieve dialogism, looking at a corpus of plays from the late nineteenth and early twentieth centuries. We find that the SAGE model of text generation, which highlights deviations from a background lexical distribution, is an effective method of weighting the words of characters’ utterances. Our results show that it is indeed possible to distinguish characters by their speech in the plays of canonical writers such as George Bernard Shaw, whereas characters are clustered more closely in the works of lesser-known playwrights.

pdf
Generative Adversarial Networks for Text Using Word2vec Intermediaries
Akshay Budhkar | Krishnapriya Vishnubhotla | Safwan Hossain | Frank Rudzicz
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

Generative adversarial networks (GANs) have shown considerable success, especially in the realistic generation of images. In this work, we apply similar techniques for the generation of text. We propose a novel approach to handle the discrete nature of text, during training, using word embeddings. Our method is agnostic to vocabulary size and achieves competitive results relative to methods with various discrete gradient estimators.