Lucie Flek


2024

pdf
Do Multilingual Large Language Models Mitigate Stereotype Bias?
Shangrui Nie | Michael Fromm | Charles Welch | Rebekka Görge | Akbar Karimi | Joan Plepi | Nazia Mowmita | Nicolas Flores-Herr | Mehdi Ali | Lucie Flek
Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP

While preliminary findings indicate that multilingual LLMs exhibit reduced bias compared to monolingual ones, a comprehensive understanding of the effect of multilingual training on bias mitigation, is lacking. This study addresses this gap by systematically training six LLMs of identical size (2.6B parameters) and architecture: five monolingual models (English, German, French, Italian, and Spanish) and one multilingual model trained on an equal distribution of data across these languages, all using publicly available data. To ensure robust evaluation, standard bias benchmarks were automatically translated into the five target languages and verified for both translation quality and bias preservation by human annotators. Our results consistently demonstrate that multilingual training effectively mitigates bias. Moreover, we observe that multilingual models achieve not only lower bias but also superior prediction accuracy when compared to monolingual models with the same amount of training data, model architecture, and size.

pdf
The Impact of Differential Privacy on Group Disparity Mitigation
Victor Hansen | Atula Neerkaje | Ramit Sawhney | Lucie Flek | Anders Søgaard
Findings of the Association for Computational Linguistics: NAACL 2024

The performance cost of differential privacy has, for some applications, been shown to be higher for minority groups; fairness, conversely, has been shown to disproportionally compromise the privacy of members of such groups. Most work in this area has been restricted to computer vision and risk assessment. In response, we evaluate the impact of differential privacy on fairness across four diverse tasks, focusing on how attempts to mitigate privacy violations and between-group performance differences interact: Does privacy inhibit attempts to ensure fairness? To this end, we train (𝜀,𝛿)-differentially private models with empirical risk minimization and group distributionally robust training objectives. Consistent with previous findings, we find that differential privacy increases between-group performance differences in the baseline setting; more interestingly, differential privacy reduces between-group performance differences in the robust setting. We explain this by interpreting differential privacy as regularization.

pdf
Perspective Taking through Generating Responses to Conflict Situations
Joan Plepi | Charles Welch | Lucie Flek
Findings of the Association for Computational Linguistics ACL 2024

Although language model performance across diverse tasks continues to improve, these models still struggle to understand and explain the beliefs of other people. This skill requires perspective-taking, the process of conceptualizing the point of view of another person. Perspective taking becomes challenging when the text reflects more personal and potentially more controversial beliefs.We explore this task through natural language generation of responses to conflict situations. We evaluate novel modifications to recent architectures for conditioning generation on an individual’s comments and self-disclosure statements. Our work extends the Social-Chem-101 corpus, using 95k judgements written by 6k authors from English Reddit data, for each of whom we obtained 20-500 self-disclosure statements. Our evaluation methodology borrows ideas from both personalized generation and theory of mind literature. Our proposed perspective-taking models outperform recent work, especially the twin encoder model conditioned on self-disclosures with high similarity to the conflict situation.

pdf
Appraisal Framework for Clinical Empathy: A Novel Application to Breaking Bad News Conversations
Allison Claire Lahnala | Béla Neuendorf | Alexander Thomin | Charles Welch | Tina Stibane | Lucie Flek
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Empathy is essential in healthcare communication. We introduce an annotation approach that draws on well-established frameworks for clinical empathy and breaking bad news (BBN) conversations for considering the interactive dynamics of discourse relations. We construct Empathy in BBNs, a span-relation task dataset of simulated BBN conversations in German, using our annotation scheme, in collaboration with a large medical school to support research on educational tools for medical didactics. The annotation is based on 1) Pounds (2011)’s appraisal framework for clinical empathy, which is grounded in systemic functional linguistics, and 2) the SPIKES protocol for breaking bad news (Baile et al., 2000), commonly taught in medical didactics training. This approach presents novel opportunities to study clinical empathic behavior and enables the training of models to detect causal relations involving empathy, a highly desirable feature of systems that can provide feedback to medical professionals in training. We present illustrative examples, discuss applications of the annotation scheme, and insights we can draw from the framework.

pdf
DeFaktS: A German Dataset for Fine-Grained Disinformation Detection through Social Media Framing
Shaina Ashraf | Isabel Bezzaoui | Ionut Andone | Alexander Markowetz | Jonas Fegert | Lucie Flek
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In today’s rapidly evolving digital age, disinformation poses a significant threat to public sentiment and socio-political dynamics. To address this, we introduce a new dataset “DeFaktS”, designed to understand and counter disinformation within German media. Distinctively curated across various news topics, DeFaktS offers an unparalleled insight into the diverse facets of disinformation. Our dataset, containing 105,855 posts with 20,008 meticulously labeled tweets, serves as a rich platform for in-depth exploration of disinformation’s diverse characteristics. A key attribute that sets DeFaktS apart is, its fine-grain annotations based on polarized categories. Our annotation framework, grounded in the textual characteristics of news content, eliminates the need for external knowledge sources. Unlike most existing corpora that typically assign a singular global veracity value to news, our methodology seeks to annotate every structural component and semantic element of a news piece, ensuring a comprehensive and detailed understanding. In our experiments, we employed a mix of classical machine learning and advanced transformer-based models. The results underscored the potential of DeFaktS, with transformer models, especially the German variant of BERT, exhibiting pronounced effectiveness in both binary and fine-grained classifications.

pdf
LeadEmpathy: An Expert Annotated German Dataset of Empathy in Written Leadership Communication
Didem Sedefoglu | Allison Claire Lahnala | Jasmin Wagner | Lucie Flek | Sandra Ohly
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Empathetic leadership communication plays a pivotal role in modern workplaces as it is associated with a wide range of positive individual and organizational outcomes. This paper introduces LeadEmpathy, an innovative expert-annotated German dataset for modeling empathy in written leadership communication. It features a novel theory-based coding scheme to model cognitive and affective empathy in asynchronous communication. The final dataset comprises 770 annotated emails from 385 participants who were allowed to rewrite their emails after receiving recommendations for increasing empathy in an online experiment. Two independent annotators achieved substantial inter-annotator agreement of >= .79 for all categories, indicating that the annotation scheme can be applied to produce high-quality, multidimensional empathy ratings in current and future applications. Beyond outlining the dataset’s development procedures, we present a case study on automatic empathy detection, establishing baseline models for predicting empathy scores in a range of ten possible scores that achieve a Pearson correlation of 0.816 and a mean squared error of 0.883. Our dataset is available at https://github.com/caisa-lab/LEAD-empathy-dataset.

pdf bib
Proceedings of the 1st Human-Centered Large Language Modeling Workshop
Nikita Soni | Lucie Flek | Ashish Sharma | Diyi Yang | Sara Hooker | H. Andrew Schwartz
Proceedings of the 1st Human-Centered Large Language Modeling Workshop

pdf
Corpus Considerations for Annotator Modeling and Scaling
Olufunke O. Sarumi | Béla Neuendorf | Joan Plepi | Lucie Flek | Jörg Schlötterer | Charles Welch
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Recent trends in natural language processing research and annotation tasks affirm a paradigm shift from the traditional reliance on a single ground truth to a focus on individual perspectives, particularly in subjective tasks. In scenarios where annotation tasks are meant to encompass diversity, models that solely rely on the majority class labels may inadvertently disregard valuable minority perspectives. This oversight could result in the omission of crucial information and, in a broader context, risk disrupting the balance within larger ecosystems. As the landscape of annotator modeling unfolds with diverse representation techniques, it becomes imperative to investigate their effectiveness with the fine-grained features of the datasets in view. This study systematically explores various annotator modeling techniques and compares their performance across seven corpora. From our findings, we show that the commonly used user token model consistently outperforms more complex models. We introduce a composite embedding approach and show distinct differences in which model performs best as a function of the agreement with a given dataset. Our findings shed light on the relationship between corpus statistics and annotator modeling performance, which informs future work on corpus construction and perspectivist NLP.

pdf
Harnessing Personalization Methods to Identify and Predict Unreliable Information Spreader Behavior
Shaina Ashraf | Fabio Gruschka | Lucie Flek | Charles Welch
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)

Studies on detecting and understanding the spread of unreliable news on social media have identified key characteristic differences between reliable and unreliable posts. These differences in language use also vary in expression across individuals, making it important to consider personal factors in unreliable news detection. The application of personalization methods for this has been made possible by recent publication of datasets with user histories, though this area is still largely unexplored. In this paper we present approaches to represent social media users in order to improve performance on three tasks: (1) classification of unreliable news posts, (2) classification of unreliable news spreaders, and, (3) prediction of the spread of unreliable news. We compare the User2Vec method from previous work to two other approaches; a learnable user embedding layer trained with the downstream task, and a representation derived from an authorship attribution classifier. We demonstrate that the implemented strategies substantially improve classification performance over state-of-the-art and provide initial results on the task of unreliable news prediction.

pdf
Archetypes and Entropy: Theory-Driven Extraction of Evidence for Suicide Risk
Vasudha Varadarajan | Allison Lahnala | Adithya V Ganesan | Gourab Dey | Siddharth Mangalik | Ana-Maria Bucur | Nikita Soni | Rajath Rao | Kevin Lanning | Isabella Vallejo | Lucie Flek | H. Andrew Schwartz | Charles Welch | Ryan Boyd
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)

Research on psychological risk factors for suicide has developed for decades. However, combining explainable theory with modern data-driven language model approaches is non-trivial. In this study, we propose and evaluate methods for identifying language patterns aligned with theories of suicide risk by combining theory-driven suicidal archetypes with language model-based and relative entropy-based approaches. Archetypes are based on prototypical statements that evince risk of suicidality while relative entropy considers the ratio of how unusual both a risk-familiar and unfamiliar model find the statements. While both approaches independently performed similarly, we find that combining the two significantly improved the performance in the shared task evaluations, yielding our combined system submission with a BERTScore Recall of 0.906. Consistent with the literature, we find that titles are highly informative as suicide risk evidence, despite the brevity. We conclude that a combination of theory- and data-driven methods are needed in the mental health space and can outperform more modern prompt-based methods.

pdf
Pitfalls of Conversational LLMs on News Debiasing
Ipek Baris Schlicht | Defne Altiok | Maryanne Taouk | Lucie Flek
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024

This paper addresses debiasing in news editing and evaluates the effectiveness of conversational Large Language Models in this task. We designed an evaluation checklist tailored to news editors’ perspectives, obtained generated texts from three popular conversational models using a subset of a publicly available dataset in media bias, and evaluated the texts according to the designed checklist. Furthermore, we examined the models as evaluator for checking the quality of debiased model outputs. Our findings indicate that none of the LLMs are perfect in debiasing. Notably, some models, including ChatGPT, introduced unnecessary changes that may impact the author’s style and create misinformation. Lastly, we show that the models do not perform as proficiently as domain experts in evaluating the quality of debiased outputs.

pdf
A Perspectivist Corpus of Numbers in Social Judgements
Marlon May | Lucie Flek | Charles Welch
Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024

With growing interest in the use of large language models, it is becoming increasingly important to understand whose views they express. These models tend to generate output that conforms to majority opinion and are not representative of diverse views. As a step toward building models that can take differing views into consideration, we build a novel corpus of social judgements. We crowdsourced annotations of a subset of the Commonsense Norm Bank that contained numbers in the situation descriptions and asked annotators to replace the number with a range defined by a start and end value that, in their view, correspond to the given verdict. Our corpus contains unaggregated annotations and annotator demographics. We describe our annotation process for social judgements and will release our dataset to support future work on numerical reasoning and perspectivist approaches to natural language processing.

2023

pdf bib
Personalized Intended and Perceived Sarcasm Detection on Twitter
Joan Plepi | Magdalena Buski | Lucie Flek
Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences

pdf
OpinionConv: Conversational Product Search with Grounded Opinions
Vahid Sadiri Javadi | Martin Potthast | Lucie Flek
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

When searching for products, the opinions of others play an important role in making informed decisions. Subjective experiences about a product can be a valuable source of information. This is also true in sales conversations, where a customer and a sales assistant exchange facts and opinions about products. However, training an AI for such conversations is complicated by the fact that language models do not possess authentic opinions for their lack of real-world experience. We address this problem by leveraging product reviews as a rich source of product opinions to ground conversational AI in true subjective narratives. With OpinionConv, we develop the first conversational AI for simulating sales conversations. To validate the generated conversations, we conduct several user studies showing that the generated opinions are perceived as realistic. Our assessors also confirm the importance of opinions as an informative basis for decision making.

pdf
Domain Transfer for Empathy, Distress, and Personality Prediction
Fabio Gruschka | Allison Lahnala | Charles Welch | Lucie Flek
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

This research contributes to the task of predicting empathy and personality traits within dialogue, an important aspect of natural language processing, as part of our experimental work for the WASSA 2023 Empathy and Emotion Shared Task. For predicting empathy, emotion polarity, and emotion intensity on turns within a dialogue, we employ adapters trained on social media interactions labeled with empathy ratings in a stacked composition with the target task adapters. Furthermore, we embed demographic information to predict Interpersonal Reactivity Index (IRI) subscales and Big Five Personality Traits utilizing BERT-based models. The results from our study provide valuable insights, contributing to advancements in understanding human behavior and interaction through text. Our team ranked 2nd on the personality and empathy prediction tasks, 4th on the interpersonal reactivity index, and 6th on the conversational task.

pdf
CAISA at SemEval-2023 Task 8: Counterfactual Data Augmentation for Mitigating Class Imbalance in Causal Claim Identification
Akbar Karimi | Lucie Flek
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Class imbalance problem can cause machine learning models to produce an undesirable performance on the minority class as well as the whole dataset. Using data augmentation techniques to increase the number of samples is one way to tackle this problem. We introduce a novel counterfactual data augmentation by verb replacement for the identification of medical claims. In addition, we investigate the impact of this method and compare it with 3 other data augmentation techniques, showing that the proposed method can result in significant (relative) improvement on the minority class.

pdf
Challenges of GPT-3-Based Conversational Agents for Healthcare
Fabian Lechner | Allison Lahnala | Charles Welch | Lucie Flek
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

The potential of medical domain dialogue agents lies in their ability to provide patients with faster information access while enabling medical specialists to concentrate on critical tasks. However, the integration of large-language models (LLMs) into these agents presents certain limitations that may result in serious consequences. This paper investigates the challenges and risks of using GPT-3-based models for medical question-answering (MedQA). We perform several evaluations contextualized in terms of standard medical principles. We provide a procedure for manually designing patient queries to stress-test high-risk limitations of LLMs in MedQA systems. Our analysis reveals that LLMs fail to respond adequately to these queries, generating erroneous medical information, unsafe recommendations, and content that may be considered offensive.

pdf
Style Locality for Controllable Generation with kNN Language Models
Gilles Nawezi | Lucie Flek | Charles Welch
Proceedings of the 1st Workshop on Taming Large Language Models: Controllability in the era of Interactive Assistants!

Recent language models have been improved by the addition of external memory. Nearest neighbor language models retrieve similar contexts to assist in word prediction. The addition of locality levels allows a model to learn how to weight neighbors based on their relative location to the current text in source documents, and have been shown to further improve model performance. Nearest neighbor models have been explored for controllable generation but have not examined the use of locality levels. We present a novel approach for this purpose and evaluate it using automatic and human evaluation on politeness, formality, supportiveness, and toxicity textual data. We find that our model is successfully able to control style and provides a better fluency-style trade-off than previous work

2022

pdf
Understanding Interpersonal Conflict Types and their Impact on Perception Classification
Charles Welch | Joan Plepi | Béla Neuendorf | Lucie Flek
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

Studies on interpersonal conflict have a long history and contain many suggestions for conflict typology. We use this as the basis of a novel annotation scheme and release a new dataset of situations and conflict aspect annotations. We then build a classifier to predict whether someone will perceive the actions of one individual as right or wrong in a given situation. Our analyses include conflict aspects, but also generated clusters, which are human validated, and show differences in conflict content based on the relationship of participants to the author. Our findings have important implications for understanding conflict and social norms.

pdf
Unifying Data Perspectivism and Personalization: An Application to Social Norms
Joan Plepi | Béla Neuendorf | Lucie Flek | Charles Welch
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Instead of using a single ground truth for language processing tasks, several recent studies have examined how to represent and predict the labels of the set of annotators. However, often little or no information about annotators is known, or the set of annotators is small. In this work, we examine a corpus of social media posts about conflict from a set of 13k annotators and 210k judgements of social norms. We provide a novel experimental setup that applies personalization methods to the modeling of annotators and compare their effectiveness for predicting the perception of social norms. We further provide an analysis of performance across subsets of social situations that vary by the closeness of the relationship between parties in conflict, and assess where personalization helps the most.

pdf bib
The Impact of Differential Privacy on Group Disparity Mitigation
Victor Petren Bach Hansen | Atula Tejaswi Neerkaje | Ramit Sawhney | Lucie Flek | Anders Sogaard
Proceedings of the Fourth Workshop on Privacy in Natural Language Processing

The performance cost of differential privacy has, for some applications, been shown to be higher for minority groups fairness, conversely, has been shown to disproportionally compromise the privacy of members of such groups. Most work in this area has been restricted to computer vision and risk assessment. In this paper, we evaluate the impact of differential privacy on fairness across four tasks, focusing on how attempts to mitigate privacy violations and between-group performance differences interact Does privacy inhibit attempts to ensure fairness? To this end, we train epsilon, delta-differentially private models with empirical risk minimization and group distributionally robust training objectives. Consistent with previous findings, we find that differential privacy increases between-group performance differences in the baseline setting but more interestingly, differential privacy reduces between-group performance differences in the robust setting. We explain this by reinterpreting differential privacy as regularization.

pdf
Nearest Neighbor Language Models for Stylistic Controllable Generation
Severino Trotta | Lucie Flek | Charles Welch
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

Recent language modeling performance has been greatly improved by the use of external memory. This memory encodes the context so that similar contexts can be recalled during decoding. This similarity depends on how the model learns to encode context, which can be altered to include other attributes, such as style. We construct and evaluate an architecture for this purpose, using corpora annotated for politeness, formality, and toxicity. Through extensive experiments and human evaluation we demonstrate the potential of our method to generate text while controlling style. We find that style-specific datastores improve generation performance, though results vary greatly across styles, and the effect of pretraining data and specific styles should be explored in future work.

pdf
Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy
Allison Lahnala | Charles Welch | Béla Neuendorf | Lucie Flek
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Large pre-trained neural language models have supported the effectiveness of many NLP tasks, yet are still prone to generating toxic language hindering the safety of their use. Using empathetic data, we improve over recent work on controllable text generation that aims to reduce the toxicity of generated text. We find we are able to dramatically reduce the size of fine-tuning data to 7.5-30k samples while at the same time making significant improvements over state-of-the-art toxicity mitigation of up to 3.4% absolute reduction (26% relative) from the original work on 2.3m samples, by strategically sampling data based on empathy scores. We observe that the degree of improvements is subject to specific communication components of empathy. In particular, the more cognitive components of empathy significantly beat the original dataset in almost all experiments, while emotional empathy was tied to less improvement and even underperforming random samples of the original data. This is a particularly implicative insight for NLP work concerning empathy as until recently the research and resources built for it have exclusively considered empathy as an emotional concept.

pdf
CAISA at WASSA 2022: Adapter-Tuning for Empathy Prediction
Allison Lahnala | Charles Welch | Lucie Flek
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

We build a system that leverages adapters, a light weight and efficient method for leveraging large language models to perform the task Em- pathy and Distress prediction tasks for WASSA 2022. In our experiments, we find that stacking our empathy and distress adapters on a pre-trained emotion lassification adapter performs best compared to full fine-tuning approaches and emotion feature concatenation. We make our experimental code publicly available

pdf
CAISA@SMM4H’22: Robust Cross-Lingual Detection of Disease Mentions on Social Media with Adversarial Methods
Akbar Karimi | Lucie Flek
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

We propose adversarial methods for increasing the robustness of disease mention detection on social media. Our method applies adversarial data augmentation on the input and the embedding spaces to the English BioBERT model. We evaluate our method in the SocialDisNER challenge at SMM4H’22 on an annotated dataset of disease mentions in Spanish tweets. We find that both methods outperform a heuristic vocabulary-based baseline by a large margin. Additionally, utilizing the English BioBERT model shows a strong performance and outperforms the data augmentation methods even when applied to the Spanish dataset, which has a large amount of data, while augmentation methods show a significant advantage in a low-data setting.

pdf
FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias
Flora Sakketou | Joan Plepi | Riccardo Cervero | Henri Jacques Geiss | Paolo Rosso | Lucie Flek
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID, monitoring political discussions on Reddit since the beginning of 2020. The dataset contains over 4K users with 3.4M Reddit posts, and includes, beyond the users’ binary labels, also their fine-grained credibility level (very low to very high) and their political bias strength (extreme right to extreme left). As far as we are aware, this is the first fake news spreader dataset that simultaneously captures both the long-term context of users’ historical posts and the interactions between them. To create the first benchmark on our data, we provide methods for identifying misinformation spreaders by utilizing the social connections between the users along with their psycho-linguistic features. We show that the users’ social interactions can, on their own, indicate misinformation spreading, while the psycho-linguistic features are mostly informative in non-neural classification settings. In a qualitative analysis we observe that detecting affective mental processes correlates negatively with right-biased users, and that the openness to experience factor is lower for those who spread fake news.

pdf
Investigating User Radicalization: A Novel Dataset for Identifying Fine-Grained Temporal Shifts in Opinion
Flora Sakketou | Allison Lahnala | Liane Vogel | Lucie Flek
Proceedings of the Thirteenth Language Resources and Evaluation Conference

There is an increasing need for the ability to model fine-grained opinion shifts of social media users, as concerns about the potential polarizing social effects increase. However, the lack of publicly available datasets that are suitable for the task presents a major challenge. In this paper, we introduce an innovative annotated dataset for modeling subtle opinion fluctuations and detecting fine-grained stances. The dataset includes a sufficient amount of stance polarity and intensity labels per user over time and within entire conversational threads, thus making subtle opinion fluctuations detectable both in long term and in short term. All posts are annotated by non-experts and a significant portion of the data is also annotated by experts. We provide a strategy for recruiting suitable non-experts. Our analysis of the inter-annotator agreements shows that the resulting annotations obtained from the majority vote of the non-experts are of comparable quality to the annotations of the experts. We provide analyses of the stance evolution in short term and long term levels, a comparison of language usage between users with vacillating and resolute attitudes, and fine-grained stance detection baselines.

pdf
OK Boomer: Probing the socio-demographic Divide in Echo Chambers
Henri-Jacques Geiss | Flora Sakketou | Lucie Flek
Proceedings of the Tenth International Workshop on Natural Language Processing for Social Media

Social media platforms such as Twitter or Reddit have become an integral part in political opinion formation and discussions, accompanied by potential echo chamber forming. In this paper, we examine the relationships between the interaction patterns, the opinion polarity, and the socio-demographic characteristics in discussion communities on Reddit. On a dataset of over 2 million posts coming from over 20k users, we combine network community detection algorithms, reliable stance polarity annotations, and NLP-based socio-demographic estimations, to identify echo chambers and understand their properties at scale. We show that the separability of the interaction communities is more strongly correlated to the relative socio-demographic divide, rather than the stance polarity gap size. We further demonstrate that the socio-demographic classifiers have a strong topical bias and should be used with caution, merely for the relative community difference comparisons within a topic, rather than for any absolute labeling.

pdf
Temporal Graph Analysis of Misinformation Spreaders in Social Media
Joan Plepi | Flora Sakketou | Henri-Jacques Geiss | Lucie Flek
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. Although the news domain is subject to rapid changes over time, the temporal dynamics of the spreaders’ language and network have not been explored yet. In this paper, we analyze the users’ time-evolving semantic similarities and social interactions and show that such patterns can, on their own, indicate misinformation spreading. Building on these observations, we propose a dynamic graph-based framework that leverages the dynamic nature of the users’ network for detecting fake news spreaders. We validate our design choice through qualitative analysis and demonstrate the contributions of our model’s components through a series of exploratory and ablative experiments on two datasets.

pdf
DMix: Adaptive Distance-aware Interpolative Mixup
Ramit Sawhney | Megh Thakkar | Shrey Pandit | Ritesh Soun | Di Jin | Diyi Yang | Lucie Flek
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Interpolation-based regularisation methods such as Mixup, which generate virtual training samples, have proven to be effective for various tasks and modalities. We extend Mixup and propose DMix, an adaptive distance-aware interpolative Mixup that selects samples based on their diversity in the embedding space. DMix leverages the hyperbolic space as a similarity measure among input samples for a richer encoded representation.DMix achieves state-of-the-art results on sentence classification over existing data augmentation methods on 8 benchmark datasets across English, Arabic, Turkish, and Hindi languages while achieving benchmark F1 scores in 3 times less number of iterations. We probe the effectiveness of DMix in conjunction with various similarity measures and qualitatively analyze the different components.DMix being generalizable, can be applied to various tasks, models and modalities.

pdf
A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing
Allison Lahnala | Charles Welch | David Jurgens | Lucie Flek
Findings of the Association for Computational Linguistics: EMNLP 2022

We review the state of research on empathy in natural language processing and identify the following issues: (1) empathy definitions are absent or abstract, which (2) leads to low construct validity and reproducibility. Moreover, (3) emotional empathy is overemphasized, skewing our focus to a narrow subset of simplified tasks. We believe these issues hinder research progress and argue that current directions will benefit from a clear conceptualization that includes operationalizing cognitive empathy components. Our main objectives are to provide insight and guidance on empathy conceptualization for NLP research objectives and to encourage researchers to pursue the overlooked opportunities in this area, highly relevant, e.g., for clinical and educational sectors.

2021

pdf
Perceived and Intended Sarcasm Detection with Graph Attention Networks
Joan Plepi | Lucie Flek
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Existing sarcasm detection systems focus on exploiting linguistic markers, context, or user-level priors. However, social studies suggest that the relationship between the author and the audience can be equally relevant for the sarcasm usage and interpretation. In this work, we propose a framework jointly leveraging (1) a user context from their historical tweets together with (2) the social information from a user’s conversational neighborhood in an interaction graph, to contextualize the interpretation of the post. We use graph attention networks (GAT) over users and tweets in a conversation thread, combined with dense user history representations. Apart from achieving state-of-the-art results on the recently published dataset of 19k Twitter users with 30K labeled tweets, adding 10M unlabeled tweets as context, our results indicate that the model contributes to interpreting the sarcastic intentions of an author more than to predicting the sarcasm perception by others.

pdf
HypMix: Hyperbolic Interpolative Data Augmentation
Ramit Sawhney | Megh Thakkar | Shivam Agarwal | Di Jin | Diyi Yang | Lucie Flek
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Interpolation-based regularisation methods for data augmentation have proven to be effective for various tasks and modalities. These methods involve performing mathematical operations over the raw input samples or their latent states representations - vectors that often possess complex hierarchical geometries. However, these operations are performed in the Euclidean space, simplifying these representations, which may lead to distorted and noisy interpolations. We propose HypMix, a novel model-, data-, and modality-agnostic interpolative data augmentation technique operating in the hyperbolic space, which captures the complex geometry of input and hidden state hierarchies better than its contemporaries. We evaluate HypMix on benchmark and low resource datasets across speech, text, and vision modalities, showing that HypMix consistently outperforms state-of-the-art data augmentation techniques. In addition, we demonstrate the use of HypMix in semi-supervised settings. We further probe into the adversarial robustness and qualitative inferences we draw from HypMix that elucidate the efficacy of the Riemannian hyperbolic manifolds for interpolation-based data augmentation.

pdf
Suicide Ideation Detection via Social and Temporal User Representations using Hyperbolic Learning
Ramit Sawhney | Harshit Joshi | Rajiv Ratn Shah | Lucie Flek
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent psychological studies indicate that individuals exhibiting suicidal ideation increasingly turn to social media rather than mental health practitioners. Personally contextualizing the buildup of such ideation is critical for accurate identification of users at risk. In this work, we propose a framework jointly leveraging a user’s emotional history and social information from a user’s neighborhood in a network to contextualize the interpretation of the latest tweet of a user on Twitter. Reflecting upon the scale-free nature of social network relationships, we propose the use of Hyperbolic Graph Convolution Networks, in combination with the Hawkes process to learn the historical emotional spectrum of a user in a time-sensitive manner. Our system significantly outperforms state-of-the-art methods on this task, showing the benefits of both socially and personally contextualized representations.

pdf
Perceived and Intended Sarcasm Detection with Graph Attention Networks
Joan Plepi | Lucie Flek
Findings of the Association for Computational Linguistics: EMNLP 2021

Existing sarcasm detection systems focus on exploiting linguistic markers, context, or user-level priors. However, social studies suggest that the relationship between the author and the audience can be equally relevant for the sarcasm usage and interpretation. In this work, we propose a framework jointly leveraging (1) a user context from their historical tweets together with (2) the social information from a user’s neighborhood in an interaction graph, to contextualize the interpretation of the post. We distinguish between perceived and self-reported sarcasm identification. We use graph attention networks (GAT) over users and tweets in a conversation thread, combined with various dense user history representations. Apart from achieving state-of-the-art results on the recently published dataset of 19k Twitter users with 30K labeled tweets, adding 10M unlabeled tweets as context, our experiments indicate that the graph network contributes to interpreting the sarcastic intentions of the author more than to predicting the sarcasm perception by others.

pdf
PHASE: Learning Emotional Phase-aware Representations for Suicide Ideation Detection on Social Media
Ramit Sawhney | Harshit Joshi | Lucie Flek | Rajiv Ratn Shah
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Recent psychological studies indicate that individuals exhibiting suicidal ideation increasingly turn to social media rather than mental health practitioners. Contextualizing the build-up of such ideation is critical for the identification of users at risk. In this work, we focus on identifying suicidal intent in tweets by augmenting linguistic models with emotional phases modeled from users’ historical context. We propose PHASE, a time-and phase-aware framework that adaptively learns features from a user’s historical emotional spectrum on Twitter for preliminary screening of suicidal risk. Building on clinical studies, PHASE learns phase-like progressions in users’ historical Plutchik-wheel-based emotions to contextualize suicidal intent. While outperforming state-of-the-art methods, we show the utility of temporal and phase-based emotional contextual cues for suicide ideation detection. We further discuss practical and ethical considerations.

2020

pdf
Returning the N to NLP: Towards Contextually Personalized Classification Models
Lucie Flek
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Most NLP models today treat language as universal, even though socio- and psycholingustic research shows that the communicated message is influenced by the characteristics of the speaker as well as the target audience. This paper surveys the landscape of personalization in natural language processing and related fields, and offers a path forward to mitigate the decades of deviation of the NLP tools from sociolingustic findings, allowing to flexibly process the “natural” language of each user rather than enforcing a uniform NLP treatment. It outlines a possible direction to incorporate these aspects into neural NLP models by means of socially contextual personalization, and proposes to shift the focus of our evaluation strategies accordingly.