Thenmozhi Durairaj


2024

pdf
Overview of Second Shared Task on Sentiment Analysis in Code-mixed Tamil and Tulu
Lavanya Sambath Kumar | Asha Hegde | Bharathi Raja Chakravarthi | Hosahalli Shashirekha | Rajeswari Natarajan | Sajeetha Thavareesan | Ratnasingam Sakuntharaj | Thenmozhi Durairaj | Prasanna Kumar Kumaresan | Charmathi Rajkumar
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Sentiment Analysis (SA) in Dravidian codemixed text is a hot research area right now. In this regard, the “Second Shared Task on SA in Code-mixed Tamil and Tulu” at Dravidian- LangTech (EACL-2024) is organized. Two tasks namely SA in Tamil-English and Tulu- English code-mixed data, make up this shared assignment. In total, 64 teams registered for the shared task, out of which 19 and 17 systems were received for Tamil and Tulu, respectively. The performance of the systems submitted by the participants was evaluated based on the macro F1-score. The best method obtained macro F1-scores of 0.260 and 0.584 for code-mixed Tamil and Tulu texts, respectively.

pdf bib
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Bharathi Raja Chakravarthi | Bharathi B | Paul Buitelaar | Thenmozhi Durairaj | György Kovács | Miguel Ángel García Cumbreras
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

2022

pdf
PANDAS@TamilNLP-ACL2022: Emotion Analysis in Tamil Text using Language Agnostic Embeddings
Divyasri K | Gayathri G L | Krithika Swaminathan | Thenmozhi Durairaj | Bharathi B | Senthil Kumar B
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

As the world around us continues to become increasingly digital, it has been acknowledged that there is a growing need for emotion analysis of social media content. The task of identifying the emotion in a given text has many practical applications ranging from screening public health to business and management. In this paper, we propose a language-agnostic model that focuses on emotion analysis in Tamil text. Our experiments yielded an F1-score of 0.010.

pdf
PANDAS@Abusive Comment Detection in Tamil Code-Mixed Data Using Custom Embeddings with LaBSE
Gayathri G L | Krithika Swaminathan | Divyasri K | Thenmozhi Durairaj | Bharathi B
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

Abusive language has lately been prevalent in comments on various social media platforms. The increasing hostility observed on the internet calls for the creation of a system that can identify and flag such acerbic content, to prevent conflict and mental distress. This task becomes more challenging when low-resource languages like Tamil, as well as the often-observed Tamil-English code-mixed text, are involved. The approach used in this paper for the classification model includes different methods of feature extraction and the use of traditional classifiers. We propose a novel method of combining language-agnostic sentence embeddings with the TF-IDF vector representation that uses a curated corpus of words as vocabulary, to create a custom embedding, which is then passed to an SVM classifier. Our experimentation yielded an accuracy of 52% and an F1-score of 0.54.

pdf
Findings of the Shared Task on Emotion Analysis in Tamil
Anbukkarasi Sampath | Thenmozhi Durairaj | Bharathi Raja Chakravarthi | Ruba Priyadharshini | Subalalitha Cn | Kogilavani Shanmugavadivel | Sajeetha Thavareesan | Sathiyaraj Thangasamy | Parameswari Krishnamurthy | Adeep Hande | Sean Benhur | Kishore Ponnusamy | Santhiya Pandiyan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

This paper presents the overview of the shared task on emotional analysis in Tamil. The result of the shared task is presented at the workshop. This paper presents the dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission. This task is organized as two Tasks. Task A is carried with 11 emotions annotated data for social media comments in Tamil and Task B is organized with 31 fine-grained emotion annotated data for social media comments in Tamil. For conducting experiments, training and development datasets were provided to the participants and results are evaluated for the unseen data. Totally we have received around 24 submissions from 13 teams. For evaluating the models, Precision, Recall, micro average metrics are used.

pdf
Overview of Abusive Comment Detection in Tamil-ACL 2022
Ruba Priyadharshini | Bharathi Raja Chakravarthi | Subalalitha Cn | Thenmozhi Durairaj | Malliga Subramanian | Kogilavani Shanmugavadivel | Siddhanth U Hegde | Prasanna Kumaresan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels. The commentsposted on social media is powerful enoughto even change the political and businessscenarios in very few hours. They alsotend to attack a particular individual ora group of individuals. This shared taskaims at detecting the abusive comments in-volving, Homophobia, Misandry, Counter-speech, Misogyny, Xenophobia, Transpho-bic. The hope speech is also identified. Adataset collected from social media taggedwith the above said categories in Tamiland Tamil-English code-mixed languagesare given to the participants. The par-ticipants used different machine learningand deep learning algorithms. This paperpresents the overview of this task compris-ing the dataset details and results of theparticipants.

pdf
SSN_NLP_MLRG at SemEval-2022 Task 4: Ensemble Learning strategies to detect Patronizing and Condescending Language
Kalaivani Adaikkan | Thenmozhi Durairaj
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this paper, we describe our efforts at SemEval 2022 Shared Task 4 on Patronizing and Condescending Language (PCL) Detection. This is the first shared task to detect PCL which is to identify and categorize PCL language towards vulnerable communities. The shared task consists of two subtasks: Patronizing and Condescending language detection (Subtask A) which is the binary task classification and identifying the PCL categories that express the condescension (Subtask B) which is the multi-label text classification. For PCL language detection, We proposed the ensemble strategies of a system combination of BERT, Roberta, Distilbert, Roberta large, Albert achieved the official results for Subtask A with a macro f1 score of 0.5172 on the test set which is improved by baseline score. For PCL Category identification, We proposed a multi-label classification model to ensemble the various Bert-based models and the official results for Subtask B with a macro f1 score of 0.2117 on the test set which is improved by baseline score.

pdf
GetSmartMSEC at SemEval-2022 Task 6: Sarcasm Detection using Contextual Word Embedding with Gaussian model for Irony Type Identification
Diksha Krishnan | Jerin Mahibha C | Thenmozhi Durairaj
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Sarcasm refers to the use of words that have different literal and intended meanings. It represents the usage of words that are opposite of what is literally said, especially in order to insult, mock, criticise or irritate someone. These types of statements may be funny or amusing to others but may hurt or annoy the person towards whom it is intended. Identification of sarcastic phrases from social media posts finds its application in different domains like sentiment analysis, opinion mining, author profiling, and harassment detection. We have proposed a model for the shared task iSarcasmEval - Intended Sarcasm Detection in English and Arabic (CITATION) by SemEval-2022 considering the language English based on ELmo embeddings for Subtasks A and C and TF-IDF vectors and Gaussian Naive bayes classifier for Subtask B. The proposed model resulted in a F1 score 0.2012 for sarcastic texts in Subtask A, macro-F1 score of 0.0387 and 0.2794 for Subtasks B and C respectively.

pdf
scubeMSEC@LT-EDI-ACL2022: Detection of Depression using Transformer Models
Sivamanikandan S | Santhosh V | Sanjaykumar N | Jerin Mahibha C | Thenmozhi Durairaj
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Social media platforms play a major role in our day-to-day life and are considered as a virtual friend by many users, who use the social media to share their feelings all day. Many a time, the content which is shared by users on social media replicate their internal life. Nowadays people love to share their daily life incidents like happy or unhappy moments and their feelings in social media and it makes them feel complete and it has become a habit for many users. Social media provides a new chance to identify the feelings of a person through their posts. The aim of the shared task is to develop a model in which the system is capable of analyzing the grammatical markers related to onset and permanent symptoms of depression. We as a team participated in the shared task Detecting Signs of Depression from Social Media Text at LT-EDI 2022- ACL 2022 and we have proposed a model which predicts depression from English social media posts using the data set shared for the task. The prediction is done based on the labels Moderate, Severe and Not Depressed. We have implemented this using different transformer models like DistilBERT, RoBERTa and ALBERT by which we were able to achieve a Macro F1 score of 0.337, 0.457 and 0.387 respectively. Our code is publicly available in the github

pdf
SSNCSE_NLP@LT-EDI-ACL2022:Hope Speech Detection for Equality, Diversity and Inclusion using sentence transformers
Bharathi B | Dhanya Srinivasan | Josephine Varsha | Thenmozhi Durairaj | Senthil Kumar B
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

In recent times, applications have been developed to regulate and control the spread of negativity and toxicity on online platforms. The world is filled with serious problems like political & religious conflicts, wars, pandemics, and offensive hate speech is the last thing we desire. Our task was to classify a text into ‘Hope Speech’ and ‘Non-Hope Speech’. We searched for datasets acquired from YouTube comments that offer support, reassurance, inspiration, and insight, and the ones that don’t. The datasets were provided to us by the LTEDI organizers in English, Tamil, Spanish, Kannada, and Malayalam. To successfully identify and classify them, we employed several machine learning transformer models such as m-BERT, MLNet, BERT, XLMRoberta, and XLM_MLM. The observed results indicate that the BERT and m-BERT have obtained the best results among all the other techniques, gaining a weighted F1- score of 0.92, 0.71, 0.76, 0.87, and 0.83 for English, Tamil, Spanish, Kannada, and Malayalam respectively. This paper depicts our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LTEDI 2021.

pdf
SSNCSE_NLP@LT-EDI-ACL2022: Speech Recognition for Vulnerable Individuals in Tamil using pre-trained XLSR models
Dhanya Srinivasan | Bharathi B | Thenmozhi Durairaj | Senthil Kumar B
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Automatic speech recognition is a tool used to transform human speech into a written form. It is used in a variety of avenues, such as in voice commands, customer, service and more. It has emerged as an essential tool in the digitisation of daily life. It has been known to be of vital importance in making the lives of elderly and disabled people much easier. In this paper we describe an automatic speech recognition model, determined by using three pre-trained models, fine-tuned from the Facebook XLSR Wav2Vec2 model, which was trained using the Common Voice Dataset. The best model for speech recognition in Tamil is determined by finding the word error rate of the data. This work explains the submission made by SSNCSE_NLP in the shared task organized by LT-EDI at ACL 2022. A word error rate of 39.4512 is achieved.

pdf
Findings of the Shared Task on Detecting Signs of Depression from Social Media
Kayalvizhi S | Thenmozhi Durairaj | Bharathi Raja Chakravarthi | Jerin Mahibha C
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Social media is considered as a platform whereusers express themselves. The rise of social me-dia as one of humanity’s most important publiccommunication platforms presents a potentialprospect for early identification and manage-ment of mental illness. Depression is one suchillness that can lead to a variety of emotionaland physical problems. It is necessary to mea-sure the level of depression from the socialmedia text to treat them and to avoid the nega-tive consequences. Detecting levels of depres-sion is a challenging task since it involves themindset of the people which can change period-ically. The aim of the DepSign-LT-EDI@ACL-2022 shared task is to classify the social me-dia text into three levels of depression namely“Not Depressed”, “Moderately Depressed”, and“Severely Depressed”. This overview presentsa description on the task, the data set, method-ologies used and an analysis on the results ofthe submissions. The models that were submit-ted as a part of the shared task had used a va-riety of technologies from traditional machinelearning algorithms to deep learning models. It could be observed from the result that thetransformer based models have outperformedthe other models. Among the 31 teams whohad submitted their results for the shared task,the best macro F1-score of 0.583 was obtainedusing transformer based model.

pdf
Overview of The Shared Task on Homophobia and Transphobia Detection in Social Media Comments
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Thenmozhi Durairaj | John McCrae | Paul Buitelaar | Prasanna Kumaresan | Rahul Ponnusamy
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Homophobia and Transphobia Detection is the task of identifying homophobia, transphobia, and non-anti-LGBT+ content from the given corpus. Homophobia and transphobia are both toxic languages directed at LGBTQ+ individuals that are described as hate speech. This paper summarizes our findings on the “Homophobia and Transphobia Detection in social media comments” shared task held at LT-EDI 2022 - ACL 2022 1. This shared taskfocused on three sub-tasks for Tamil, English, and Tamil-English (code-mixed) languages. It received 10 systems for Tamil, 13 systems for English, and 11 systems for Tamil-English. The best systems for Tamil, English, and Tamil-English scored 0.570, 0.870, and 0.610, respectively, on average macro F1-score.