2025
pdf
bib
abs
SSNTrio@DravidianLangTech 2025: Identification of AI Generated Content in Dravidian Languages using Transformers
J Bhuvana
|
Mirnalinee T T
|
Rohan R
|
Diya Seshan
|
Avaneesh Koushik
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The increasing prevalence of AI-generated content has raised concerns about the authenticity and reliability of online reviews, particularly in resource-limited languages like Tamil and Malayalam. This paper presents an approach to the Shared Task on Detecting AI-generated Product Reviews in Dravidian Languages at NAACL2025, which focuses on distinguishing AI-generated reviews from human-written ones in Tamil and Malayalam. Several transformer-based models, including IndicBERT, RoBERTa, mBERT, and XLM-R, were evaluated, with language-specific BERT models for Tamil and Malayalam demonstrating the best performance. The chosen methodologies were evaluated using Macro Average F1 score. In the rank list released by the organizers, team SSNTrio, achieved ranks of 3rd and 29th for the Malayalam and Tamil datasets with Macro Average F1 Scores of 0.914 and 0.598 respectively.
pdf
bib
abs
SSNTrio@DravidianLangTech 2025: Sentiment Analysis in Dravidian Languages using Multilingual BERT
J Bhuvana
|
Mirnalinee T T
|
Diya Seshan
|
Rohan R
|
Avaneesh Koushik
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper presents an approach to sentiment analysis for code-mixed Tamil-English and Tulu-English datasets as part of the DravidianLangTech@NAACL 2025 shared task. Sentiment analysis, the process of determining the emotional tone or subjective opinion in text, has become a critical tool in analyzing public sentiment on social media platforms. The approach discussed here uses multilingual BERT (mBERT) fine-tuned on the provided datasets to classify sentiment polarity into various predefined categories: for Tulu, the categories were positive, negative, not_tulu, mixed, and neutral; for Tamil, the categories were positive, negative, unknown, mixed_feelings, and neutral. The mBERT model demonstrates its effectiveness in handling sentiment analysis for codemixed and resource-constrained languages by achieving an F1-score of 0.44 for Tamil, securing the 6th position in the ranklist; and 0.56 for Tulu, ranking 5th in the respective task.
pdf
bib
abs
SSNTrio@DravidianLangTech2025: LLM Based Techniques for Detection of Abusive Text Targeting Women
Mirnalinee T T
|
J Bhuvana
|
Avaneesh Koushik
|
Diya Seshan
|
Rohan R
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This study focuses on developing a solution for detecting abusive texts on social media against women in Tamil and Malayalam, two low-resource Dravidian languages in South India. As the usage of social media for communication and idea sharing has increased significantly, these platforms are being used to target and victimize women. Hence an automated solution becomes necessary to screen the huge volume of content generated. This work is part of the shared Task on Abusive Tamil and Malayalam Text targeting Women on Social MediaDravidianLangTech@NAACL 2025. The approach used to tackle this problem involves utilizing LLM based techniques for classifying abusive text. The Macro Average F1-Score for the Tamil BERT model was 0.76 securing the 11th position, while the Malayalam BERT model for Malayalam obtained a score of 0.30 and secured the 33rd rank. The proposed solution can be extended further to incorporate other regional languages as well based on similar techniques.
pdf
bib
abs
SSNTrio @ DravidianLangTech 2025: Hybrid Approach for Hate Speech Detection in Dravidian Languages with Text and Audio Modalities
J Bhuvana
|
Mirnalinee T T
|
Rohan R
|
Diya Seshan
|
Avaneesh Koushik
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper presents the approach and findings from the Multimodal Social Media Data Analysis in Dravidian Languages (MSMDA-DL) shared task at DravidianLangTech@NAACL 2025. The task focuses on detecting multimodal hate speech in Tamil, Malayalam, and Telugu, requiring models to analyze both text and speech components from social media content. The proposed methodology uses language-specific BERT models for the provided text transcripts, followed by multimodal feature extraction techniques, and classification using a Random Forest classifier to enhance performance across the three languages. The models achieved a macro-F1 score of 0.7332 (Rank 1) in Tamil, 0.7511 (Rank 1) in Malayalam, and 0.3758 (Rank 2) in Telugu, demonstrating the effectiveness of the approach in multilingual settings. The models performed well despite the challenges posed by limited resources, highlighting the potential of language-specific BERT models and multimodal techniques in hate speech detection for Dravidian languages.
2022
pdf
bib
abs
Varsini_and_Kirthanna@DravidianLangTech-ACL2022-Emotional Analysis in Tamil
Varsini S
|
Kirthanna Rajan
|
Angel S
|
Rajalakshmi Sivanaiah
|
Sakaya Milton Rajendram
|
Mirnalinee T T
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
In this paper, we present our system for the task of Emotion analysis in Tamil. Over 3.96 million people use these platforms to send messages formed using texts, images, videos, audio or combinations of these to express their thoughts and feelings. Text communication on social media platforms is quite overwhelming due to its enormous quantity and simplicity. The data must be processed to understand the general feeling felt by the author. We present a lexicon-based approach for the extraction emotion in Tamil texts. We use dictionaries of words labelled with their respective emotions. The process of assigning an emotional label to each text, and then capture the main emotion expressed in it. Finally, the F1-score in the official test set is 0.0300 and our method ranks 5th.
pdf
bib
abs
SSN_ARMM@ LT-EDI -ACL2022: Hope Speech Detection for Equality, Diversity, and Inclusion Using ALBERT model
Praveenkumar Vijayakumar
|
Prathyush S
|
Aravind P
|
Angel S
|
Rajalakshmi Sivanaiah
|
Sakaya Milton Rajendram
|
Mirnalinee T T
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
In recent years social media has become one of the major forums for expressing human views and emotions. With the help of smartphones and high-speed internet, anyone can express their views on Social media. However, this can also lead to the spread of hatred and violence in society. Therefore it is necessary to build a method to find and support helpful social media content. In this paper, we studied Natural Language Processing approach for detecting Hope speech in a given sentence. The task was to classify the sentences into ‘Hope speech’ and ‘Non-hope speech’. The dataset was provided by LT-EDI organizers with text from Youtube comments. Based on the task description, we developed a system using the pre-trained language model BERT to complete this task. Our model achieved 1st rank in the Kannada language with a weighted average F1 score of 0.750, 2nd rank in the Malayalam language with a weighted average F1 score of 0.740, 3rd rank in the Tamil language with a weighted average F1 score of 0.390 and 6th rank in the English language with a weighted average F1 score of 0.880.
pdf
bib
abs
SSN_MLRG3 @LT-EDI-ACL2022-Depression Detection System from Social Media Text using Transformer Models
Sarika Esackimuthu
|
Shruthi Hariprasad
|
Rajalakshmi Sivanaiah
|
Angel S
|
Sakaya Milton Rajendram
|
Mirnalinee T T
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Depression is a common mental illness that involves sadness and lack of interest in all day-to-day activities. The task is to classify the social media text as signs of depression into three labels namely “not depressed”, “moderately depressed”, and “severely depressed”. We have build a system using Deep Learning Model “Transformers”. Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. The multi-class classification model used in our system is based on the ALBERT model. In the shared task ACL 2022, Our team SSN_MLRG3 obtained a Macro F1 score of 0.473.
pdf
bib
abs
TechSSN at SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification using Deep Learning Models
Rajalakshmi Sivanaiah
|
Angel S
|
Sakaya Milton Rajendram
|
Mirnalinee T T
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Research is progressing in a fast manner in the field of offensive, hate speech, abusive and sarcastic data. Tackling hate speech against women is urgent and really needed to give respect to the lady of our life. This paper describes the system used for identifying misogynous content using images and text. The system developed by the team TECHSSN uses transformer models to detect the misogynous content from text and Convolutional Neural Network model for image data. Various models like BERT, ALBERT, XLNET and CNN are explored and the combination of ALBERT and CNN as an ensemble model provides better results than the rest. This system was developed for the task 5 of the competition, SemEval 2022.
pdf
bib
abs
TechSSN at SemEval-2022 Task 6: Intended Sarcasm Detection using Transformer Models
Ramdhanush V
|
Rajalakshmi Sivanaiah
|
Angel S
|
Sakaya Milton Rajendram
|
Mirnalinee T T
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Irony detection in the social media is an upcoming research which places a main role in sentiment analysis and offensive language identification. Sarcasm is one form of irony that is used to provide intended comments against realism. This paper describes a method to detect intended sarcasm in text (SemEval-2022 Task 6). The TECHSSN team used Bidirectional Encoder Representations from Transformers (BERT) models and its variants to classify the text as sarcastic or non-sarcastic in English and Arabic languages. The data is preprocessed and fed to the model for training. The transformer models learn the weights during the training phase from the given dataset and predicts the output class labels for the unseen test data.
pdf
bib
abs
SSN_MLRG1 at SemEval-2022 Task 10: Structured Sentiment Analysis using 2-layer BiLSTM
Karun Anantharaman
|
Divyasri K
|
Jayannthan Pt
|
Angel S
|
Rajalakshmi Sivanaiah
|
Sakaya Milton Rajendram
|
Mirnalinee T T
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Task 10 in SemEval 2022 is a composite task which entails analysis of opinion tuples, and recognition and demarcation of their nature. In this paper, we will elaborate on how such a methodology is implemented, how it is undertaken for a Structured Sentiment Analysis, and the results obtained thereof. To achieve this objective, we have adopted a bi-layered BiLSTM approach. In our research, a variation on the norm has been effected towards enhancement of accuracy, by basing the categorization meted out to an individual member as a by-product of its adjacent members, using specialized algorithms to ensure the veracity of the output, which has been modelled to be the holistically most accurate label for the entire sequence. Such a strategy is superior in terms of its parsing accuracy and requires less time. This manner of action has yielded an SF1 of 0.33 in the highest-performing configuration.
2019
pdf
bib
abs
TECHSSN at SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Tweets using Deep Neural Networks
Angel Suseelan
|
Rajalakshmi S
|
Logesh B
|
Harshini S
|
Geetika B
|
Dyaneswaran S
|
S Milton Rajendram
|
Mirnalinee T T
Proceedings of the 13th International Workshop on Semantic Evaluation
Task 6 of SemEval 2019 involves identifying and categorizing offensive language in social media. The systems developed by TECHSSN team uses multi-level classification techniques. We have developed two systems. In the first system, the first level of classification is done by a multi-branch 2D CNN classifier with Google’s pre-trained Word2Vec embedding and the second level of classification by string matching technique supported by offensive and bad words dictionary. The second system uses a multi-branch 1D CNN classifier with Glove pre-trained embedding layer for the first level of classification and string matching for the second level of classification. Input data with a probability of less than 0.70 in the first level are passed on to the second level. The misclassified examples are classified correctly in the second level.
pdf
bib
abs
SSN-SPARKS at SemEval-2019 Task 9: Mining Suggestions from Online Reviews using Deep Learning Techniques on Augmented Data
Rajalakshmi S
|
Angel Suseelan
|
S Milton Rajendram
|
Mirnalinee T T
Proceedings of the 13th International Workshop on Semantic Evaluation
This paper describes the work on mining the suggestions from online reviews and forums. Opinion mining detects whether the comments are positive, negative or neutral, while suggestion mining explores the review content for the possible tips or advice. The system developed by SSN-SPARKS team in SemEval-2019 for task 9 (suggestion mining) uses a rule-based approach for feature selection, SMOTE technique for data augmentation and deep learning technique (Convolutional Neural Network) for classification. We have compared the results with Random Forest classifier (RF) and MultiLayer Perceptron (MLP) model. Results show that the CNN model performs better than other models for both the subtasks.
2018
pdf
bib
abs
SSN MLRG1 at SemEval-2018 Task 1: Emotion and Sentiment Intensity Detection Using Rule Based Feature Selection
Angel Deborah S
|
Rajalakshmi S
|
S Milton Rajendram
|
Mirnalinee T T
Proceedings of the 12th International Workshop on Semantic Evaluation
The system developed by the SSN MLRG1 team for Semeval-2018 task 1 on affect in tweets uses rule based feature selection and one-hot encoding to generate the input feature vector. Multilayer Perceptron was used to build the model for emotion intensity ordinal classification, sentiment analysis ordinal classification and emotion classfication subtasks. Support Vector Machine was used to build the model for emotion intensity regression and sentiment intensity regression subtasks.
pdf
bib
abs
SSN MLRG1 at SemEval-2018 Task 3: Irony Detection in English Tweets Using MultiLayer Perceptron
Rajalakshmi S
|
Angel Deborah S
|
S Milton Rajendram
|
Mirnalinee T T
Proceedings of the 12th International Workshop on Semantic Evaluation
Sentiment analysis plays an important role in E-commerce. Identifying ironic and sarcastic content in text plays a vital role in inferring the actual intention of the user, and is necessary to increase the accuracy of sentiment analysis. This paper describes the work on identifying the irony level in twitter texts. The system developed by the SSN MLRG1 team in SemEval-2018 for task 3 (irony detection) uses rule based approach for feature selection and MultiLayer Perceptron (MLP) technique to build the model for multiclass irony classification subtask, which classifies the given text into one of the four class labels.
2017
pdf
bib
abs
SSN_MLRG1 at SemEval-2017 Task 4: Sentiment Analysis in Twitter Using Multi-Kernel Gaussian Process Classifier
Angel Deborah S
|
S Milton Rajendram
|
T T Mirnalinee
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
The SSN MLRG1 team for Semeval-2017 task 4 has applied Gaussian Process, with bag of words feature vectors and fixed rule multi-kernel learning, for sentiment analysis of tweets. Since tweets on the same topic, made at different times, may exhibit different emotions, their properties such as smoothness and periodicity also vary with time. Our experiments show that, compared to single kernel, multiple kernels are effective in learning the simultaneous presence of multiple properties.
pdf
bib
abs
SSN_MLRG1 at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis Using Multiple Kernel Gaussian Process Regression Model
Angel Deborah S
|
S Milton Rajendram
|
T T Mirnalinee
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
The system developed by the SSN_MLRG1 team for Semeval-2017 task 5 on fine-grained sentiment analysis uses Multiple Kernel Gaussian Process for identifying the optimistic and pessimistic sentiments associated with companies and stocks. Since the comments made at different times about the same companies and stocks may display different emotions, their properties such as smoothness and periodicity may vary. Our experiments show that while single kernel Gaussian Process can learn certain properties well, Multiple Kernel Gaussian Process are effective in learning the presence of different properties simultaneously.