2025
pdf
bib
abs
Findings of the Shared Task Multilingual Bias and Propaganda Annotation in Political Discourse
Shunmuga Priya Muthusamy Chinnan
|
Bharathi Raja Chakravarthi
|
Meghann Drury-Grogan
|
Senthil Kumar B
|
Saranya Rajiakodi
|
Angel Deborah S
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
The Multilingual Bias and Propaganda Annotation task focuses on annotating biased and propagandist content in political discourse across English and Tamil. This paper presents the findings of the shared task on bias and propaganda annotation task. This task involves two sub tasks, one in English and another in Tamil, both of which are annotation task where a text comment is to be labeled. With a particular emphasis on polarizing policy debates such as the US Gender Policy and India’s Three Language Policy, this shared task invites participants to build annotation systems capable of labeling textual bias and propaganda. The dataset was curated by collecting comments from YouTube videos. Our curated dataset consists of 13,010 English sentences on US Gender Policy, Russia-Ukraine War and 5,880 Tamil sentences on Three Language Policy. Participants were instructed to annotate following the guidelines at sentence level with the bias labels that are fine-grained, domain specific and 4 propaganda labels. Participants were encouraged to leverage existing tools or develop novel approaches to perform fine-grained annotations that capture the complex socio-political nuances present in the data.
pdf
bib
abs
TechSSN3 at SemEval-2025 Task 11: Multi-Label Emotion Detection Using Ensemble Transformer Models and Lexical Rules
Vishal S
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Transformer models, specifically BERT-Large Uncased, DeBERTa, and RoBERTa, are first employed to classify the dataset, with their hyperparameters being fine-tuned to identify the most effective configuration. These models leverage deep contextual embeddings to capture nuanced semantic and syntactic information, making them powerful for sentiment analysis. However, transformer-based models alone may not fully capture the structural aspects of sentiment-bearing sentences.To address this, part-of-speech (POS) tagging is incorporated using a Hidden Markov Model (HMM) to analyze sentence structure and identify the key words responsible for conveying sentiment. By isolating adjectives, adverbs, and verbs, the lexical sentiment of individual words is determined using a polarity-based scoring method. This lexical score, derived from sentiment lexicons like SentiWordNet, provides an additional layer of interpretability, particularly in cases where transformer models struggle with implicit sentiment cues or negation handling.A key innovation in this approach is the adaptive weighting mechanism used to combine the outputs of the transformer models and lexical scoring. Instead of assigning uniform importance to each method, a unique weight is assigned to each model for every emotion category, ensuring that the best-performing approach contributes more significantly to the final sentiment prediction. For instance, DeBERTa, which excels in contextual understanding, is given more weight for subtle emotions like sadness, whereas lexical scoring is emphasized for emotions heavily influenced by explicit adjectives, such as joy or anger. The weight allocation is determined empirically through performance evaluation on a validation set, ensuring an optimal balance between deep learning-based contextual understanding and rule-based sentiment assessment.Additionally, traditional machine learning models such as Support Vector Machines (SVMs), Decision Trees, and Random Forests are tested for comparative analysis. However, these models demonstrate inferior performance, struggling with capturing deep contextual semantics and handling nuanced expressions of sentiment, reinforcing the superiority of the hybrid transformer + lexical approach.This method not only enhances interpretability but also improves accuracy, particularly in cases where sentiment is influenced by structural elements, negations, or compound expressions. The combined framework ensures a more robust and adaptable sentiment analysis model, effectively balancing data-driven learning and linguistic insights.
pdf
bib
abs
DataBees at SemEval-2025 Task 11: Challenges and Limitations in Multi-Label Emotion Detection
Sowmya Anand
|
Tanisha Sriram
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
|
Mirnalinee Thankanadar
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Text-based emotion detection is crucial in NLP,with applications in sentiment analysis, socialmedia monitoring, and human-computer interaction. This paper presents our approach tothe Multi-label Emotion Detection challenge,classifying texts into joy, sadness, anger, fear,and surprise. We experimented with traditionalmachine learning and transformer-based models, but results were suboptimal: F1 scores of0.3723 (English), 0.5174 (German), and 0.6957(Spanish). We analyze the impact of preprocessing, model selection, and dataset characteristics, highlighting key challenges in multilabel emotion classification and potential improvements.
pdf
bib
abs
RSSN at SemEval-2025 Task 11: Optimizing Multi-Label Emotion Detection with Transformer-Based Models and Threshold Tuning
Ravindran V
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Our study explores multi-label emotion classification using fine-tuned BERT models, achieving superior performance over traditional methods such as logistic regression. The intricate nature of overlapping emotional expressions in text necessitates a robust classification framework. Fine-tuning BERT with weighted binary cross-entropy loss enhances predictive accuracy, particularly for underrepresented emotions like anger and joy. Moreover, threshold optimization plays a pivotal role in refining decision boundaries, boosting recall, and increasing the macro F1-score. Comparative analysis against RoBERTa and XGBoost further underscores the effectiveness of contextual embeddings in capturing subtle emotional nuances. Despite these improvements, challenges such as class imbalance and inter-class confusion persist, highlighting the need for future advancements in ensemble learning, contrastive pretraining, and domain-adaptive fine-tuning.
pdf
bib
abs
TECHSSN at SemEval-2025 Task 10: A Comparative Analysis of Transformer Models for Dominant Narrative-Based News Summarization
Pooja Premnath
|
Venkatasai Ojus Yenumulapalli
|
Parthiban Mohankumar
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper presents an approach to Task 10 of SemEval 2025, which focuses on summarizing English news articles using a given dominant narrative. The dataset comprises news articles on the Russia-Ukraine war and climate change, introducing challenges related to bias, information compression, and contextual coherence. Transformer-based models, specifically BART variants, are utilized to generate concise and coherent summaries. Our team TechSSN, achieved 4th place on the official test leaderboard with a BERTScore of 0.74203, employing the DistilBART-CNN-12-6 model.
2024
pdf
bib
abs
SSN_ARMM at SemEval-2024 Task 10: Emotion Detection in Multilingual Code-Mixed Conversations using LinearSVC and TF-IDF
Rohith Arumugam S
|
Angel Deborah S
|
Rajalakshmi S
|
Milton R S
|
Mirnalinee T T
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Our paper explores a task involving the analysis of emotions and triggers within dialogues. We annotate each utterance with an emotion and identify triggers, focusing on binary labeling. We emphasize clear guidelines for replicability and conduct thorough analyses, including multiple system runs and experiments to highlight effective techniques. By simplifying the complexities and detailing clear methodologies, our study contributes to advancing emotion analysis and trigger identification within dialogue systems.
pdf
bib
abs
TECHSSN at SemEval-2024 Task 10: LSTM-based Approach for Emotion Detection in Multilingual Code-Mixed Conversations
Ravindran V
|
Shreejith Babu G
|
Aashika Jetti
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
|
Mirnalinee T T
|
Milton R S
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Emotion Recognition in Conversation (ERC) in the context of code-mixed Hindi-English interactions is a subtask addressed in SemEval-2024 as Task 10. We made our maiden attempt to solve the problem using natural language processing, machine learning and deep learning techniques, that perform well in properly assigning emotions to individual utterances from a predefined collection. The use of well-proven classifier such as Long Short Term Memory networks improve the model’s efficacy than the BERT and Glove based models. How-ever, difficulties develop in the subtle arena of emotion-flip reasoning in multi-party discussions, emphasizing the importance of specialized methodologies. Our findings shed light on the intricacies of emotion dynamics in code-mixed languages, pointing to potential areas for further research and refinement in multilingual understanding.
pdf
bib
abs
TECHSSN at SemEval-2024 Task 1: Multilingual Analysis for Semantic Textual Relatedness using Boosted Transformer Models
Shreejith Babu G
|
Ravindran V
|
Aashika Jetti
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This paper presents our approach to SemEval- 2024 Task 1: Semantic Textual Relatedness (STR). Out of the 14 languages provided, we specifically focused on English and Telugu. Our proposal employs advanced natural language processing techniques and leverages the Sentence Transformers library for sentence embeddings. For English, a Gradient Boosting Regressor trained on DistilBERT embeddingsachieves competitive results, while for Telugu, a multilingual model coupled with hyperparameter tuning yields enhanced performance. The paper discusses the significance of semantic relatedness in various languages, highlighting the challenges and nuances encountered. Our findings contribute to the understanding of semantic textual relatedness across diverse linguistic landscapes, providing valuable insights for future research in multilingual natural language processing.
2023
pdf
bib
abs
Athena@DravidianLangTech: Abusive Comment Detection in Code-Mixed Languages using Machine Learning Techniques
Hema M
|
Anza Prem
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
The amount of digital material that is disseminated through various social media platforms has significantly increased in recent years. Online networks have gained popularity in recent years and have established themselves as goto resources for news, information, and entertainment. Nevertheless, despite the many advantages of using online networks, mounting evidence indicates that an increasing number of malicious actors are taking advantage of these networks to spread poison and hurt other people. This work aims to detect abusive content in youtube comments written in the languages like Tamil, Tamil-English (codemixed), Telugu-English (code-mixed). This work was undertaken as part of the “DravidianLangTech@ RANLP 2023” shared task. The Macro F1 values for the Tamil, Tamil-English, and Telugu-English datasets were 0.28, 0.37, and 0.6137 and secured 5th, 7th, 8th rank respectively.
pdf
bib
abs
Avalanche at DravidianLangTech: Abusive Comment Detection in Code Mixed Data Using Machine Learning Techniques with Under Sampling
Rajalakshmi Sivanaiah
|
Rajasekar S
|
Srilakshmisai K
|
Angel Deborah S
|
Mirnalinee ThankaNadar
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
In recent years, the growth of online platforms and social media has given rise to a concerning increase in the presence of abusive content. This poses significant challenges for maintaining a safe and inclusive digital environment. In order to resolve this issue, this paper experiments an approach for detecting abusive comments. We are using a combination of pipelining and vectorization techniques, along with algorithms such as the stochastic gradient descent (SGD) classifier and support vector machine (SVM) classifier. We conducted experiments on an Tamil-English code mixed dataset to evaluate the performance of this approach. Using the stochastic gradient descent classifier algorithm, we achieved a weighted F1 score of 0.76 and a macro score of 0.45 for development dataset. Furthermore, by using the support vector machine classifier algorithm, we obtained a weighted F1 score of 0.78 and a macro score of 0.42 for development dataset. With the test dataset, SGD approach secured 5th rank with 0.44 macro F1 score, while SVM scored 8th rank with 0.35 macro F1 score in the shared task. The top rank team secured 0.55 macro F1 score.
pdf
bib
abs
TechSSN1 at LT-EDI-2023: Depression Detection and Classification using BERT Model for Social Media Texts
Venkatasai Ojus Yenumulapalli
|
Vijai Aravindh R
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
Depression is a severe mental health disorder characterized by persistent feelings of sadness and anxiety, a decline in cognitive functioning resulting in drastic changes in a human’s psychological and physical well-being. However, depression is curable completely when treated at a suitable time and treatment resulting in the rejuvenation of an individual. The objective of this paper is to devise a technique for detecting signs of depression from English social media comments as well as classifying them based on their intensity into severe, moderate, and not depressed categories. The paper illustrates three approaches that are developed when working toward the problem. Of these approaches, the BERT model proved to be the most suitable model with an F1 macro score of 0.407, which gave us the 11th rank overall.
pdf
bib
abs
SSNTech2@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments Using Linear Classification Techniques
Vaidhegi D
|
Priya M
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
|
Mirnalinee ThankaNadar
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
The abusive content on social media networks is causing destructive effects on the mental well-being of online users. Homophobia refers to the fear, negative attitudes and feeling towards homosexuality. Transphobia refer to negative attitudes, hatred and prejudice towards transsexual people. Even though, some parts of the society have started to accept homosexuality and transsexuality, there are still a large set of the population opposing it. Hate speech targeting LGBTQ+ individuals, known as homophobia/transphobia speech, has become a growing concern. This has led to a toxic and unwelcoming environment for LGBTQ+ people on online platforms. This poses a significant societal issue, hindering the progress of equality, diversity, and inclusion. The identification of homophobic and transphobic comments on social media platforms plays a crucial role in creating a safer environment for all social media users. In order to accomplish this, we built a machine learning model using SGD and SVM classifier. Our approach yielded promising results, with a weighted F1-score of 0.95 on the English dataset and we secured 4th rank in this task.
pdf
bib
abs
TechSSN4@LT-EDI-2023: Depression Sign Detection in Social Media Postings using DistilBERT Model
Krupa Elizabeth Thannickal
|
Sanmati P
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
As world population increases, more people are living to the age when depression or Major Depressive Disorder (MDD) commonly occurs. Consequently, the number of those who suffer from such disorders is rising. There is a pressing need for faster and reliable diagnosis methods. This paper proposes the method to analyse text input from social media posts of subjects to determine the severity class of depression. We have used the DistilBERT transformer to process these texts and classify the individuals across three severity labels - ‘not depression’, ‘moderate’ and ‘severe’. The results showed the macro F1-score of 0.437 when the model was trained for 5 epochs with a comparative performance across the labels.The team acquired 6th rank while the top team scored macro F1-score as 0.470. We hope that this system will support further research into the early identification of depression in individuals to promote effective medical research and related treatments.
pdf
bib
abs
The Mavericks@LT-EDI-2023: Detection of signs of Depression from social Media Texts using Navie Bayse approach
Sathvika V S
|
Vaishnavi Vaishnavi S
|
Angel Deborah S
|
Rajalakshmi Sivanaiah
|
Mirnalinee ThankaNadar
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
Social media platforms have revolutionized the landscape of communication, providing individuals with an outlet to express their thoughts, emotions, and experiences openly. This paper focuses on the development of a model to determine whether individuals exhibit signs of depression based on their social media texts. With the aim of optimizing performance and accuracy, a Naive Bayes approach was chosen for the detection task.The Naive Bayes algorithm, a probabilistic classifier, was applied to extract features and classify the texts. The model leveraged linguistic patterns, sentiment analysis, and other relevant features to capture indicators of depression within the texts. Preprocessing techniques, including tokenization, stemming, and stop-word removal, were employed to enhance the quality of the input data.The performance of the Naive Bayes model was evaluated using standard metrics such as accuracy, precision, recall, and F1-score, it acheived a macro- avergaed F1 score of 0.263.
pdf
bib
abs
TechSSN at SemEval-2023 Task 12: Monolingual Sentiment Classification in Hausa Tweets
Nishaanth Ramanathan
|
Rajalakshmi Sivanaiah
|
Angel Deborah S
|
Mirnalinee Thanka Nadar Thanagathai
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper elaborates on our work in designing a system for SemEval 2023 Task 12: AfriSentiSemEval, which involves sentiment analysis for low-resource African languages using the Twitter dataset. We utilised a pre-trained model to perform sentiment classification in Hausa language tweets. We used a multilingual version of the roBERTa model, which is pretrained on 100 languages, to classify sentiments in Hausa. To tokenize the text, we used the AfriBERTa model, which is specifically pretrained on African languages.
2022
pdf
bib
abs
TechSSN at SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification using Deep Learning Models
Rajalakshmi Sivanaiah
|
Angel Deborah S
|
Sakaya Milton R
|
Mirnalinee T T
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Research is progressing in a fast manner in the field of offensive, hate speech, abusive and sarcastic data. Tackling hate speech against women is urgent and really needed to give respect to the lady of our life. This paper describes the system used for identifying misogynous content using images and text. The system developed by the team TECHSSN uses transformer models to detect the misogynous content from text and Convolutional Neural Network model for image data. Various models like BERT, ALBERT, XLNET and CNN are explored and the combination of ALBERT and CNN as an ensemble model provides better results than the rest. This system was developed for the task 5 of the competition, SemEval 2022.
pdf
bib
abs
TechSSN at SemEval-2022 Task 6: Intended Sarcasm Detection using Transformer Models
Rajalakshmi Sivanaiah
|
Angel Deborah S
|
Sakaya Milton R
|
Mirnalinee T T
|
Ramdhanush Venkatakrishnan
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Irony detection in the social media is an upcoming research which places a main role in sentiment analysis and offensive language identification. Sarcasm is one form of irony that is used to provide intended comments against realism. This paper describes a method to detect intended sarcasm in text (SemEval-2022 Task 6). The TECHSSN team used Bidirectional Encoder Representations from Transformers (BERT) models and its variants to classify the text as sarcastic or non-sarcastic in English and Arabic languages. The data is preprocessed and fed to the model for training. The transformer models learn the weights during the training phase from the given dataset and predicts the output class labels for the unseen test data.
2021
pdf
bib
abs
TECHSSN at SemEval-2021 Task 7: Humor and Offense detection and classification using ColBERT embeddings
Rajalakshmi Sivanaiah
|
Angel Deborah S
|
S Milton Rajendram
|
Mirnalinee T T
|
Abrit Pal Singh
|
Aviansh Gupta
|
Ayush Nanda
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
This paper describes the system used for detecting humor in text. The system developed by the team TECHSSN uses binary classification techniques to classify the text. The data undergoes preprocessing and is given to ColBERT (Contextualized Late Interaction over BERT), a modification of Bidirectional Encoder Representations from Transformers (BERT). The model is re-trained and the weights are learned for the dataset. This system was developed for the task 7 of the competition, SemEval 2021.
2020
pdf
bib
abs
TECHSSN at SemEval-2020 Task 12: Offensive Language Detection Using BERT Embeddings
Rajalakshmi Sivanaiah
|
Angel Deborah S
|
S Milton Rajendram
|
Mirnalinee T T
Proceedings of the Fourteenth Workshop on Semantic Evaluation
This paper describes the work of identifying the presence of offensive language in social media posts and categorizing a post as targeted to a particular person or not. The work developed by team TECHSSN for solving the Multilingual Offensive Language Identification in Social Media (Task 12) in SemEval-2020 involves the use of deep learning models with BERT embeddings. The dataset is preprocessed and given to a Bidirectional Encoder Representations from Transformers (BERT) model with pretrained weight vectors. The model is retrained and the weights are learned for the offensive language dataset. We have developed a system with the English language dataset. The results are better when compared to the model we developed in SemEval-2019 Task6.
2019
pdf
bib
abs
SSN-SPARKS at SemEval-2019 Task 9: Mining Suggestions from Online Reviews using Deep Learning Techniques on Augmented Data
Rajalakshmi S
|
Angel Deborah S
|
S Milton Rajendram
|
Mirnalinee T T
Proceedings of the 13th International Workshop on Semantic Evaluation
This paper describes the work on mining the suggestions from online reviews and forums. Opinion mining detects whether the comments are positive, negative or neutral, while suggestion mining explores the review content for the possible tips or advice. The system developed by SSN-SPARKS team in SemEval-2019 for task 9 (suggestion mining) uses a rule-based approach for feature selection, SMOTE technique for data augmentation and deep learning technique (Convolutional Neural Network) for classification. We have compared the results with Random Forest classifier (RF) and MultiLayer Perceptron (MLP) model. Results show that the CNN model performs better than other models for both the subtasks.
2018
pdf
bib
abs
SSN MLRG1 at SemEval-2018 Task 1: Emotion and Sentiment Intensity Detection Using Rule Based Feature Selection
Angel Deborah S
|
Rajalakshmi S
|
S Milton Rajendram
|
Mirnalinee T T
Proceedings of the 12th International Workshop on Semantic Evaluation
The system developed by the SSN MLRG1 team for Semeval-2018 task 1 on affect in tweets uses rule based feature selection and one-hot encoding to generate the input feature vector. Multilayer Perceptron was used to build the model for emotion intensity ordinal classification, sentiment analysis ordinal classification and emotion classfication subtasks. Support Vector Machine was used to build the model for emotion intensity regression and sentiment intensity regression subtasks.
pdf
bib
abs
SSN MLRG1 at SemEval-2018 Task 3: Irony Detection in English Tweets Using MultiLayer Perceptron
Rajalakshmi S
|
Angel Deborah S
|
S Milton Rajendram
|
Mirnalinee T T
Proceedings of the 12th International Workshop on Semantic Evaluation
Sentiment analysis plays an important role in E-commerce. Identifying ironic and sarcastic content in text plays a vital role in inferring the actual intention of the user, and is necessary to increase the accuracy of sentiment analysis. This paper describes the work on identifying the irony level in twitter texts. The system developed by the SSN MLRG1 team in SemEval-2018 for task 3 (irony detection) uses rule based approach for feature selection and MultiLayer Perceptron (MLP) technique to build the model for multiclass irony classification subtask, which classifies the given text into one of the four class labels.
2017
pdf
bib
abs
SSN_MLRG1 at SemEval-2017 Task 4: Sentiment Analysis in Twitter Using Multi-Kernel Gaussian Process Classifier
Angel Deborah S
|
S Milton Rajendram
|
T T Mirnalinee
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
The SSN MLRG1 team for Semeval-2017 task 4 has applied Gaussian Process, with bag of words feature vectors and fixed rule multi-kernel learning, for sentiment analysis of tweets. Since tweets on the same topic, made at different times, may exhibit different emotions, their properties such as smoothness and periodicity also vary with time. Our experiments show that, compared to single kernel, multiple kernels are effective in learning the simultaneous presence of multiple properties.
pdf
bib
abs
SSN_MLRG1 at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis Using Multiple Kernel Gaussian Process Regression Model
Angel Deborah S
|
S Milton Rajendram
|
T T Mirnalinee
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
The system developed by the SSN_MLRG1 team for Semeval-2017 task 5 on fine-grained sentiment analysis uses Multiple Kernel Gaussian Process for identifying the optimistic and pessimistic sentiments associated with companies and stocks. Since the comments made at different times about the same companies and stocks may display different emotions, their properties such as smoothness and periodicity may vary. Our experiments show that while single kernel Gaussian Process can learn certain properties well, Multiple Kernel Gaussian Process are effective in learning the presence of different properties simultaneously.