Tiberiu Sosea


2022

pdf
EnsyNet: A Dataset for Encouragement and Sympathy Detection
Tiberiu Sosea | Cornelia Caragea
Proceedings of the Thirteenth Language Resources and Evaluation Conference

More and more people turn to Online Health Communities to seek social support during their illnesses. By interacting with peers with similar medical conditions, users feel emotionally and socially supported, which in turn leads to better adherence to therapy. Current studies in Online Health Communities focus only on the presence or absence of emotional support, while the available datasets are scarce or limited in terms of size. To enable development on emotional support detection, we introduce EnsyNet, a dataset of 6,500 sentences annotated with two types of support: encouragement and sympathy. We train BERT-based classifiers on this dataset, and apply our best BERT model in two large scale experiments. The results of these experiments show that receiving encouragements or sympathy improves users’ emotional state, while the lack of emotional support negatively impacts patients’ emotional state.

pdf
Emotion analysis and detection during COVID-19
Tiberiu Sosea | Chau Pham | Alexander Tekle | Cornelia Caragea | Junyi Jessy Li
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Understanding emotions that people express during large-scale crises helps inform policy makers and first responders about the emotional states of the population as well as provide emotional support to those who need such support. We present CovidEmo, a dataset of ~3,000 English tweets labeled with emotions and temporally distributed across 18 months. Our analyses reveal the emotional toll caused by COVID-19, and changes of the social narrative and associated emotions over time. Motivated by the time-sensitive nature of crises and the cost of large-scale annotation efforts, we examine how well large pre-trained language models generalize across domains and timeline in the task of perceived emotion prediction in the context of COVID-19. Our analyses suggest that cross-domain information transfers occur, yet there are still significant gaps. We propose semi-supervised learning as a way to bridge this gap, obtaining significantly better performance using unlabeled data from the target domain.

pdf
Leveraging Training Dynamics and Self-Training for Text Classification
Tiberiu Sosea | Cornelia Caragea
Findings of the Association for Computational Linguistics: EMNLP 2022

The effectiveness of pre-trained language models in downstream tasks is highly dependent on the amount of labeled data available for training. Semi-supervised learning (SSL) is a promising technique that has seen wide attention recently due to its effectiveness in improving deep learning models when training data is scarce. Common approaches employ a teacher-student self-training framework, where a teacher network generates pseudo-labels for unlabeled data, which are then used to iteratively train a student network. In this paper, we propose a new self-training approach for text classification that leverages training dynamics of unlabeled data. We evaluate our approach on a wide range of text classification tasks, including emotion detection, sentiment analysis, question classification and gramaticality, which span a variety of domains, e.g, Reddit, Twitter, and online forums. Notably, our method is successful on all benchmarks, obtaining an average increase in F1 score of 3.5% over strong baselines in low resource settings.

pdf
Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts
Hongli Zhan | Tiberiu Sosea | Cornelia Caragea | Junyi Jessy Li
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Crises such as the COVID-19 pandemic continuously threaten our world and emotionally affect billions of people worldwide in distinct ways. Understanding the triggers leading to people’s emotions is of crucial importance. Social media posts can be a good source of such analysis, yet these texts tend to be charged with multiple emotions, with triggers scattering across multiple sentences. This paper takes a novel angle, namely, emotion detection and trigger summarization, aiming to both detect perceived emotions in text, and summarize events and their appraisals that trigger each emotion. To support this goal, we introduce CovidET (Emotions and their Triggers during Covid-19), a dataset of ~1,900 English Reddit posts related to COVID-19, which contains manual annotations of perceived emotions and abstractive summaries of their triggers described in the post. We develop strong baselines to jointly detect emotions and summarize emotion triggers. Our analyses show that CovidET presents new challenges in emotion-specific summarization, as well as multi-emotion detection in long social media posts.

pdf
Multimodal Semi-supervised Learning for Disaster Tweet Classification
Iustin Sirbu | Tiberiu Sosea | Cornelia Caragea | Doina Caragea | Traian Rebedea
Proceedings of the 29th International Conference on Computational Linguistics

During natural disasters, people often use social media platforms, such as Twitter, to post information about casualties and damage produced by disasters. This information can help relief authorities gain situational awareness in nearly real time, and enable them to quickly distribute resources where most needed. However, annotating data for this purpose can be burdensome, subjective and expensive. In this paper, we investigate how to leverage the copious amounts of unlabeled data generated on social media by disaster eyewitnesses and affected individuals during disaster events. To this end, we propose a semi-supervised learning approach to improve the performance of neural models on several multimodal disaster tweet classification tasks. Our approach shows significant improvements, obtaining up to 7.7% improvements in F-1 in low-data regimes and 1.9% when using the entire training data. We make our code and data publicly available at https://github.com/iustinsirbu13/multimodal-ssl-for-disaster-tweet-classification.

2021

pdf
P-Stance: A Large Dataset for Stance Detection in Political Domain
Yingjie Li | Tiberiu Sosea | Aditya Sawant | Ajith Jayaraman Nair | Diana Inkpen | Cornelia Caragea
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
eMLM: A New Pre-training Objective for Emotion Related Tasks
Tiberiu Sosea | Cornelia Caragea
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

BERT has been shown to be extremely effective on a wide variety of natural language processing tasks, including sentiment analysis and emotion detection. However, the proposed pretraining objectives of BERT do not induce any sentiment or emotion-specific biases into the model. In this paper, we present Emotion Masked Language Modelling, a variation of Masked Language Modelling aimed at improving the BERT language representation model for emotion detection and sentiment analysis tasks. Using the same pre-training corpora as the original model, Wikipedia and BookCorpus, our BERT variation manages to improve the downstream performance on 4 tasks from emotion detection and sentiment analysis by an average of 1.2% F-1. Moreover, our approach shows an increased performance in our task-specific robustness tests.

2020

pdf
CancerEmo: A Dataset for Fine-Grained Emotion Detection
Tiberiu Sosea | Cornelia Caragea
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Emotions are an important element of human nature, often affecting the overall wellbeing of a person. Therefore, it is no surprise that the health domain is a valuable area of interest for emotion detection, as it can provide medical staff or caregivers with essential information about patients. However, progress on this task has been hampered by the absence of large labeled datasets. To this end, we introduce CancerEmo, an emotion dataset created from an online health community and annotated with eight fine-grained emotions. We perform a comprehensive analysis of these emotions and develop deep learning models on the newly created dataset. Our best BERT model achieves an average F1 of 71%, which we improve further using domain-specific pre-training.