2025
pdf
bib
abs
Beyond Context to Cognitive Appraisal: Emotion Reasoning as a Theory of Mind Benchmark for Large Language Models
Gerard Christopher Yeo
|
Kokil Jaidka
Findings of the Association for Computational Linguistics: ACL 2025
Datasets used for emotion recognition tasks typically contain overt cues that can be used in predicting the emotions expressed in a text. However, one challenge is that texts sometimes contain covert contextual cues that are rich in affective semantics, which warrant higher-order reasoning abilities to infer emotional states, not simply the emotions conveyed. This study advances beyond surface-level perceptual features to investigate how large language models (LLMs) reason about others’ emotional states using contextual information, within a Theory-of-Mind (ToM) framework. Grounded in Cognitive Appraisal Theory, we curate a specialized ToM evaluation dataset to assess both forward reasoning—from context to emotion—and backward reasoning—from emotion to inferred context. We showed that LLMs can reason to a certain extent, although they are poor at associating situational outcomes and appraisals with specific emotions. Our work highlights the need for psychological theories in the training and evaluation of LLMs in the context of emotion reasoning.
2024
pdf
bib
abs
“Thinking” Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models
Shaz Furniturewala
|
Surgan Jandial
|
Abhinav Java
|
Pragyan Banerjee
|
Simra Shahid
|
Sumit Bhatia
|
Kokil Jaidka
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Existing debiasing techniques are typically training-based or require access to the model’s internals and output distributions, so they are inaccessible to end-users looking to adapt LLM outputs for their particular needs. In this study, we examine whether structured prompting techniques can offer opportunities for fair text generation. We evaluate a comprehensive end-user-focused iterative framework of debiasing that applies System 2 thinking processes for prompts to induce logical, reflective, and critical text generation, with single, multi-step, instruction, and role-based variants. By systematically evaluating many LLMs across many datasets and different prompting strategies, we show that the more complex System 2-based Implicative Prompts significantly improve over other techniques demonstrating lower mean bias in the outputs with competitive performance on the downstream tasks. Our work offers research directions for the design and the potential of end-user-focused evaluative frameworks for LLM use.
pdf
bib
abs
Beyond Text: Leveraging Multi-Task Learning and Cognitive Appraisal Theory for Post-Purchase Intention Analysis
Gerard Yeo
|
Shaz Furniturewala
|
Kokil Jaidka
Findings of the Association for Computational Linguistics: ACL 2024
Supervised machine-learning models for predicting user behavior offer a challenging classification problem with lower average prediction performance scores than other text classification tasks. This study evaluates multi-task learning frameworks grounded in Cognitive Appraisal Theory to predict user behavior as a function of users’ self-expression and psychological attributes. Our experiments show that users’ language and traits improve predictions above and beyond models predicting only from text. Our findings highlight the importance of integrating psychological constructs into NLP to enhance the understanding and prediction of user actions. We close with a discussion of the implications for future applications of large language models for computational psychology.
pdf
bib
abs
Impact of Decoding Methods on Human Alignment of Conversational LLMs
Shaz Furniturewala
|
Kokil Jaidka
|
Yashvardhan Sharma
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
To be included into chatbot systems, Large language models (LLMs) must be aligned with human conversational conventions. However, being trained mainly on web-scraped data gives existing LLMs a voice closer to informational text than actual human speech. In this paper, we examine the effect of decoding methods on the alignment between LLM-generated and human conversations, including Beam Search, Top K Sampling, and Nucleus Sampling. We present new measures of alignment in substance, style, and psychometric orientation, and experiment with two conversation datasets. Our results provide subtle insights: better alignment is attributed to fewer beams in Beam Search and lower values of P in Nucleus Sampling. We also find that task-oriented and open-ended datasets perform differently in terms of alignment, indicating the significance of taking into account the context of the interaction.
pdf
bib
abs
Empaths at WASSA 2024 Empathy and Personality Shared Task: Turn-Level Empathy Prediction Using Psychological Indicators
Shaz Furniturewala
|
Kokil Jaidka
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
For the WASSA 2024 Empathy and Personality Prediction Shared Task, we propose a novel turn-level empathy detection method that decomposes empathy into six psychological indicators: Emotional Language, Perspective-Taking, Sympathy and Compassion, Extroversion, Openness, and Agreeableness. A pipeline of text enrichment using a Large Language Model (LLM) followed by DeBERTA fine-tuning demonstrates a significant improvement in the Pearson Correlation Coefficient and F1 scores for empathy detection, highlighting the effectiveness of our approach. Our system officially ranked 7th at the CONV-turn track.
2023
pdf
bib
abs
I am PsyAM: Modeling Happiness with Cognitive Appraisal Dimensions
Xuan Liu
|
Kokil Jaidka
Findings of the Association for Computational Linguistics: ACL 2023
This paper proposes and evaluates PsyAM (
https://anonymous.4open.science/r/BERT-PsyAM-10B9), a framework that incorporates adaptor modules in a sequential multi-task learning setup to generate high-dimensional feature representations of hedonic well-being (momentary happiness) in terms of its psychological underpinnings. PsyAM models emotion in text through its cognitive antecedents through auxiliary models that achieve multi-task learning through novel feature fusion methods. We show that BERT-PsyAM has cross-task validity and cross-domain generalizability through experiments with emotion-related tasks – on new emotion tasks and new datasets, as well as against traditional methods and BERT baselines. We further probe the robustness of BERT-PsyAM through feature ablation studies, as well as discuss the qualitative inferences we can draw regarding the effectiveness of the framework for representing emotional states. We close with a discussion of a future agenda of psychology-inspired neural network architectures.
pdf
bib
abs
The PEACE-Reviews dataset: Modeling Cognitive Appraisals in Emotion Text Analysis
Gerard Yeo
|
Kokil Jaidka
Findings of the Association for Computational Linguistics: EMNLP 2023
Cognitive appraisal plays a pivotal role in deciphering emotions. Recent studies have delved into its significance, yet the interplay between various forms of cognitive appraisal and specific emotions, such as joy and anger, remains an area of exploration in consumption contexts. Our research introduces the PEACE-Reviews dataset, a unique compilation of annotated autobiographical accounts where individuals detail their emotional and appraisal experiences during interactions with personally significant products or services. Focusing on the inherent variability in consumer experiences, this dataset offers an in-depth analysis of participants’ psychological traits, their evaluative feedback on purchases, and the resultant emotions. Notably, the PEACE-Reviews dataset encompasses emotion, cognition, individual traits, and demographic data. We also introduce preliminary models that predict certain features based on the autobiographical narratives.
pdf
bib
abs
Predicting Sentence-Level Factuality of News and Bias of Media Outlets
Francielle Vargas
|
Kokil Jaidka
|
Thiago Pardo
|
Fabrício Benevenuto
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Automated news credibility and fact-checking at scale require accurate prediction of news factuality and media bias. This paper introduces a large sentence-level dataset, titled “FactNews”, composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources by formulating two text classification problems for predicting sentence-level factuality of news reporting and bias of media outlets. Our experiments demonstrate that biased sentences present a higher number of words compared to factual sentences, besides having a predominance of emotions. Hence, the fine-grained analysis of subjectivity and impartiality of news articles showed promising results for predicting the reliability of entire media outlets. Finally, due to the severity of fake news and political polarization in Brazil, and the lack of research for Portuguese, both dataset and baseline were proposed for Brazilian Portuguese.
2022
pdf
bib
abs
Offer a Different Perspective: Modeling the Belief Alignment of Arguments in Multi-party Debates
Suzanna Sia
|
Kokil Jaidka
|
Hansin Ahuja
|
Niyati Chhaya
|
Kevin Duh
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
In contexts where debate and deliberation are the norm, the participants are regularly presented with new information that conflicts with their original beliefs. When required to update their beliefs (belief alignment), they may choose arguments that align with their worldview (confirmation bias). We test this and competing hypotheses in a constraint-based modeling approach to predict the winning arguments in multi-party interactions in the Reddit Change My View and Intelligence Squared debates datasets. We adopt a hierarchical generative Variational Autoencoder as our model and impose structural constraints that reflect competing hypotheses about the nature of argumentation. Our findings suggest that in most settings, predictive models that anticipate winning arguments to be further from the initial argument of the opinion holder are more likely to succeed.
pdf
bib
abs
Developing A Multilabel Corpus for the Quality Assessment of Online Political Talk
Kokil Jaidka
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper motivates and presents the Twitter Deliberative Politics dataset, a corpus of political tweets labeled for its deliberative characteristics. The corpus was randomly sampled from replies to US congressmen and women. It is expected to be useful to a general community of computational linguists, political scientists, and social scientists interested in the study of online political expression, computer-mediated communication, and political deliberation. The data sampling and annotation methods are discussed and classical machine learning approaches are evaluated for their predictive performance on the different deliberative facets. The paper concludes with a discussion of future work aimed at developing dictionaries for the quality assessment of online political talk in English. The dataset and a demo dashboard are available at 
https://github.com/kj2013/twitter-deliberative-politics.
2021
pdf
bib
abs
WikiTalkEdit: A Dataset for modeling Editors’ behaviors on Wikipedia
Kokil Jaidka
|
Andrea Ceolin
|
Iknoor Singh
|
Niyati Chhaya
|
Lyle Ungar
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
This study introduces and analyzes WikiTalkEdit, a dataset of conversations and edit histories from Wikipedia, for research in online cooperation and conversation modeling. The dataset comprises dialog triplets from the Wikipedia Talk pages, and editing actions on the corresponding articles being discussed. We show how the data supports the classic understanding of style matching, where positive emotion and the use of first-person pronouns predict a positive emotional change in a Wikipedia contributor. However, they do not predict editorial behavior. On the other hand, feedback invoking evidentiality and criticism, and references to Wikipedia’s community norms, is more likely to persuade the contributor to perform edits but is less likely to lead to a positive emotion. We developed baseline classifiers trained on pre-trained RoBERTa features that can predict editorial change with an F1 score of .54, as compared to an F1 score of .66 for predicting emotional change. A diagnostic analysis of persisting errors is also provided. We conclude with possible applications and recommendations for future work. The dataset is publicly available for the research community at 
https://github.com/kj2013/WikiTalkEdit/.
2018
pdf
bib
abs
Identifying Locus of Control in Social Media Language
Masoud Rouhizadeh
|
Kokil Jaidka
|
Laura Smith
|
H. Andrew Schwartz
|
Anneke Buffone
|
Lyle Ungar
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Individuals express their locus of control, or “control”, in their language when they identify whether or not they are in control of their circumstances. Although control is a core concept underlying rhetorical style, it is not clear whether control is expressed by how or by what authors write. We explore the roles of syntax and semantics in expressing users’ sense of control –i.e. being “controlled by” or “in control of” their circumstances– in a corpus of annotated Facebook posts. We present rich insights into these linguistic aspects and find that while the language signaling control is easy to identify, it is more challenging to label it is internally or externally controlled, with lexical features outperforming syntactic features at the task. Our findings could have important implications for studying self-expression in social media.
pdf
bib
abs
Diachronic degradation of language models: Insights from social media
Kokil Jaidka
|
Niyati Chhaya
|
Lyle Ungar
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Natural languages change over time because they evolve to the needs of their users and the socio-technological environment. This study investigates the diachronic accuracy of pre-trained language models for downstream tasks in machine learning and user profiling. It asks the question: given that the social media platform and its users remain the same, how is language changing over time? How can these differences be used to track the changes in the affect around a particular topic? To our knowledge, this is the first study to show that it is possible to measure diachronic semantic drifts within social media and within the span of a few years.
2017
pdf
bib
abs
Domain Adaptation from User-level Facebook Models to County-level Twitter Predictions
Daniel Rieman
|
Kokil Jaidka
|
H. Andrew Schwartz
|
Lyle Ungar
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Several studies have demonstrated how language models of user attributes, such as personality, can be built by using the Facebook language of social media users in conjunction with their responses to psychology questionnaires. It is challenging to apply these models to make general predictions about attributes of communities, such as personality distributions across US counties, because it requires 1. the potentially inavailability of the original training data because of privacy and ethical regulations, 2. adapting Facebook language models to Twitter language without retraining the model, and 3. adapting from users to county-level collections of tweets. We propose a two-step algorithm, Target Side Domain Adaptation (TSDA) for such domain adaptation when no labeled Twitter/county data is available. TSDA corrects for the different word distributions between Facebook and Twitter and for the varying word distributions across counties by adjusting target side word frequencies; no changes to the trained model are made. In the case of predicting the Big Five county-level personality traits, TSDA outperforms a state-of-the-art domain adaptation method, gives county-level predictions that have fewer extreme outliers, higher year-to-year stability, and higher correlation with county-level outcomes.
2016
pdf
bib
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)
Guillaume Cabanac
|
Muthu Kumar Chandrasekaran
|
Ingo Frommholz
|
Kokil Jaidka
|
Min-Yen Kan
|
Philipp Mayr
|
Dietmar Wolfram
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)
pdf
bib
Overview of the CL-SciSumm 2016 Shared Task
Kokil Jaidka
|
Muthu Kumar Chandrasekaran
|
Sajal Rustagi
|
Min-Yen Kan
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)
2013
pdf
bib
Deconstructing Human Literature Reviews – A Framework for Multi-Document Summarization
Kokil Jaidka
|
Christopher Khoo
|
Jin-Cheon Na
Proceedings of the 14th European Workshop on Natural Language Generation