Ratnavel Rajalakshmi
2026
GYAAN-SAHIT: A Persona-Driven Multi-Agent Framework for Caste-Based Hate Speech Detection
Sakshi Gupta | Shunmuga Priya Muthusamy Chinnan | Saranya Rajiakodi | Ratnavel Rajalakshmi | Bharathi Raja Chakravarthi
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
Sakshi Gupta | Shunmuga Priya Muthusamy Chinnan | Saranya Rajiakodi | Ratnavel Rajalakshmi | Bharathi Raja Chakravarthi
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
Social media has amplified public discourse in India while perpetuating caste-based hierarchies. Despite legal protections, caste-based hate speech continues to propagate across digital platforms through culturally embedded expressions that conventional classifiers often struggle to interpret. We propose GYAAN-SAHIT, a knowledge-driven multi-agent framework that addresses this problem through structured debate-based classification. Each agent adopts a distinct ideological and socio-cultural persona, engaging in multi-turn argumentation to reason over context, subtext, and intent. A critic agent then evaluates the coherence of the debate before producing the final classification. The framework further integrates Hindi hate lexicons to ground its reasoning in linguistic and cultural specificity. Experiments show that GYAAN-SAHIT achieves improvement in performance while generating culturally grounded explanations, demonstrating the effectiveness of persona-based multi-agent reasoning for hate speech detection in low-resource and socially complex environments.
DLRG@LT-EDI 2026: Automating Counter-Narratives for Homophobic and Transphobic Comments
Ramesh Kannan R | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
Ramesh Kannan R | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
Online hate speech is spreading rapidly, creating significant challenge, particularly in low-resource language such as Tamil. Lack of developed automated content moderation systems makes it difficult to control harmful content effectively. In this study, we propose a computational framework for generating Counter Narratives (CNs) using classical NLP techniques. With this, we leverage TF-IDF features with n-grams to identify the labels as Homophobic or Transphobic. Span detection is performed with TF-IDF features with n-grams and Machine learning models. Counter narratives are then retrieved by computing cosine similarity, ensuring semantic alignment and contextual relevance. Evaluation on the expanded human curated dataset demonstrates that our approach produces contextually appropriate and semantically coherent counter narratives. Notably, the proposed system is submitted at Task 2 shown a overall average score of 80.40 % for Tamil and 77.29 % for English and secured first and fourth rank respectively. GitHub: https://github.com/kannanrrk/Span-Counter-Feature-Based
HNK@DravidianLangTech 2026: Investigating Grapheme-Level Normalization for Abusive Tamil Text Classification
Hanish Vigneshwar R | Nahul Alaguraj | Karthikeyan Manimaran | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Hanish Vigneshwar R | Nahul Alaguraj | Karthikeyan Manimaran | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The increasing prevalence of social media has also correlated with an increase in abusive content targeting women, particularly for regional languages such as Tamil. The automatic identification of abusive content is critical for the creation of safer online spaces. In this paper, we focus on the abusive text detection of women in the context of binary text classification. We evaluated the performance of the proposed system on the abusive text detection of women using the IndicBERT, MuRIL, and Tamil-BERT models. Additionally, we propose the use of grapheme-aware normalization for the proposed system. Grapheme-aware normalization aims to maintain the structural integrity of Tamil characters at the Unicode level. The experimental results reveal that the proposed system using the Tamil-BERT model with grapheme-aware normalization achieves the best performance among the evaluated models. The proposed system achieved the third position in the shared task.
DLRG@DravidianLangTech 2026: Explainable Transformer-Based Detection of Abusive Tamil Text Targeting Women on Social Media
Mirudhula Sankar | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Mirudhula Sankar | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Many social media platforms have users who have normalized the abuse of women online, creating a need for systems that automatically detect such activity. For low-resource, regional languages like Tamil, which has informal writing styles, spelling variations, dialectal differences, and culturally specific expressions, it becomes a challenge to correctly detect abusive comments. In this work, a transformer-based approach for binary classification of Tamil comments into abusive and non-abusive categories is done using the DravidianLangTech dataset. The proposed system fine-tunes MuRIL(a multilingual transformer pretrained for Indian languages), enabling effective contextual representation with minimal preprocessing. To improve the transparency of the system, a post-hoc Explainable AI component is incorporated. A perturbation-based method using log-odds differences identifies words that significantly influence the predictions. Experimental findings indicate that the model reaches a validation accuracy exceeding 81% while also exhibiting a strong macro-F1 score. This research shows that utilizing contextual multilingual representations alongside simple interpretability methods offers a viable and effective approach for detecting abusive text in Tamil. The implementation of our system is publicly available at https://github.com/mirud5173/Abusive-Tamil-Comment-Detection-using-Transformer-Models
DLRG@DravidianLangTech 2026: Dual-Purpose Whisper Adaptation for Tamil Dialect Identification and Dialectal Speech Recognition
Gulisetty Abhinav | Tanisha Nanda | Ramesh Kannan R | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Gulisetty Abhinav | Tanisha Nanda | Ramesh Kannan R | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper describes our system developed for the shared task on Dialect Based Speech Recognition and Classification in Tamil at DravidianLangTech@ACL 2026. We participated in both Subtask 1 (Dialect Identification) and Subtask 2 (Dialectal ASR). Our approach leverages a single Tamil-adapted Whisper Medium model as a unified foundation for both tasks. For dialect classification, we have used the Whisper encoder as a feature extractor by discarding the decoder, applying mean pooling over the temporal dimension, and fine-tuning the full encoder with a lightweight classification head, achieving 73.4% accuracy on the test set. For dialectal ASR, we apply Low-Rank Adaptation (LoRA) to the full encoder-decoder architecture with SpecAugment-based data augmentation, achieving a Word Error Rate (WER) of 0.55 on the test set. Our experiments reveal that unfreezing the pre-trained encoder is critical for dialect discrimination, boosting accuracy from 52.78% (frozen) to 73.4% (unfrozen). The code is publicly available at https://github.com/DLRG-VIT/DravidianLangTech2026
Shared Task on Prompt Style Recovery for Large Language Models in Telugu
Premjith B | Jyothish Lal G | Bharathi Raja Chakravarthi | Saranya Rajiakodi | Thenmozhi Durairaj | Ratnavel Rajalakshmi | Rahul Ponnusamy | Chinthala Bhuvanesh
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Premjith B | Jyothish Lal G | Bharathi Raja Chakravarthi | Saranya Rajiakodi | Thenmozhi Durairaj | Ratnavel Rajalakshmi | Rahul Ponnusamy | Chinthala Bhuvanesh
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper presents an overview of the Shared Task on Prompt Recovery for Large Language Models (LLMs) in Telugu, organized as part of DravidianLangTech @ ACL 2026. The task focuses on identifying the underlying communicative style of Telugu text excerpts, framed as a nine-class single-label classification problem covering Formal, Informal, Optimistic, Pessimistic, Humorous, Serious, Inspiring, Authoritative, and Persuasive tones. The dataset was constructed by collecting Telugu YouTube comments and generating style-modified variants using an LLM, resulting in 3,000 training instances, 300 validation samples, and 301 test samples. A total of 52 teams registered for the shared task, with 13 teams submitting valid system predictions. Systems explored diverse approaches, including transformer-based fine-tuning (IndicBERT, MuRIL, XLM-R), ensemble and stacking methods, pairwise modeling strategies, curriculum learning, and few-shot large language model prompting. Evaluation was conducted using Macro F1-score as the primary metric. The top-performing system achieved a Macro F1-score of 0.2987. Overall results indicate that Telugu prompt-style recovery remains a challenging problem, particularly due to stylistic overlap and high lexical similarity across classes.
From Comments to Harm: A Findings Report on Abusive Tamil Text Targeting Women on Social Media Shared Task
Bhuvaneswari Sivagnanam | Kathiravan Pannerselvam | Jananayagan | Charmathi Rajkumar | Ramesh Kannan R | Ratnavel Rajalakshmi | Shunmuga Priya Muthusamy Chinnan | Saranya Rajiakodi | Bharathi Raja Chakravarthi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Bhuvaneswari Sivagnanam | Kathiravan Pannerselvam | Jananayagan | Charmathi Rajkumar | Ramesh Kannan R | Ratnavel Rajalakshmi | Shunmuga Priya Muthusamy Chinnan | Saranya Rajiakodi | Bharathi Raja Chakravarthi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper presents an overview of the second shared task on Abusive Tamil Text Targeting Women on Social Media as a binary classification problem (abusive vs. non-abusive). We release a dataset of Tamil YouTube comments and evaluate submissions using macro-F1 to encourage balanced performance in a noisy, low-resource setting. There are 89 teams registered for this task and 24 teams submitted the results. The approaches used by the teams includes transformer fine-tuning, heterogeneous ensembles, classical baselines, and large language models using prompting and LoRA. Results show that the best-performing system scored 0.8297 macro-F1 and many submissions are around 0.79-0.81. Across submissions, transformer fine-tuning with domain-aligned encoders is consistently strong, while additional gains are frequently associated with Tamil-aware normalization and macro-F1-oriented calibration such as class-weighted learning and validation-based threshold tuning. Overall, the findings highlights the importance of language-aware preprocessing and careful decision calibration for reliable moderation of women-targeted abusive Tamil social media text.Disclaimer: This paper (including figures and examples) may contain offensive or harmful language, including abusive content targeting women. All such text is presented solely for research and educational purposes and it does not reflect the author’s views. Reader discretion is advised.
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Anand Kumar Madasamy | Sajeetha Thavareesan | Saranya Rajiakodi | Subalalitha Navaneethakrishnan | Dhivya Chinnappa | Balasubramanian Palani | Malliga Subramanian | Kogilavani Shanmugavadivel | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Anand Kumar Madasamy | Sajeetha Thavareesan | Saranya Rajiakodi | Subalalitha Navaneethakrishnan | Dhivya Chinnappa | Balasubramanian Palani | Malliga Subramanian | Kogilavani Shanmugavadivel | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
2025
Overview of the Fifth Shared Task on Speech Recognition for Vulnerable Individuals in Tamil
Bharathi B | Bharathi Raja Chakravarthi | Sripriya N | Rajeswari Natarajan | Ratnavel Rajalakshmi | Suhasini S
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Bharathi B | Bharathi Raja Chakravarthi | Sripriya N | Rajeswari Natarajan | Ratnavel Rajalakshmi | Suhasini S
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
In this paper, an overview of the shared task on speech recognition for vulnerable individuals in Tamil (LT-EDI@LDK2025) is described. The work comes with a Tamil dataset that was collected from elderly individuals who identify as male, female, or transgender. The audio samples were taken in public places such as markets, vegetable shops, hospitals, etc. The training phase and the testing phase are when the dataset is made available. The task required of the participants was to handle audio signals using various models and techniques and then turn in their results as transcriptions of the provided test samples. The participant’s results were assessed using WER (Word Error Rate). The transformer-based approach was used by participants to achieve automatic voice recognition. This overview paper discusses the findings and various pre-trained transformer-based models that the participants employed.
DLRG@DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages
Ratnavel Rajalakshmi | Ramesh Kannan | Meetesh Saini | Bitan Mallik
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Ratnavel Rajalakshmi | Ramesh Kannan | Meetesh Saini | Bitan Mallik
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Social media is a powerful communication tooland rich in diverse content requiring innovativeapproaches to understand nuances of the lan-guages. Addressing challenges like hate speechnecessitates multimodal analysis that integratestextual, and other cues to capture its contextand intent effectively. This paper proposes amultimodal hate speech detection system inTamil, which uses textual and audio featuresfor classification. Our proposed system usesa fine-tuned Indic-BERT model for text basedhate speech detection and Wav2Vec2 modelfor audio based hate speech detection of au-dio data. The fine-tuned Indic-BERT modelwith Whisper achieved an F1 score of 0.25 onMultimodal approach. Our proposed approachranked at 10th position in the shared task onMultimodal Hate Speech Detection in Dravid-ian languages at the NAACL 2025 WorkshopDravidianLangTech.
Overview of the Shared Task on Multimodal Hate Speech Detection in Dravidian languages: DravidianLangTech@NAACL 2025
Jyothish Lal G | Premjith B | Bharathi Raja Chakravarthi | Saranya Rajiakodi | Bharathi B | Rajeswari Natarajan | Ratnavel Rajalakshmi
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Jyothish Lal G | Premjith B | Bharathi Raja Chakravarthi | Saranya Rajiakodi | Bharathi B | Rajeswari Natarajan | Ratnavel Rajalakshmi
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The detection of hate speech in social media platforms is very crucial these days. This is due to its adverse impact on mental health, social harmony, and online safety. This paper presents the overview of the shared task on Multimodal Hate Speech Detection in Dravidian Languages organized as part of DravidianLangTech@NAACL 2025. The task emphasizes detecting hate speech in social media content that combines speech and text. Here, we focus on three low-resource Dravidian languages: Malayalam, Tamil, and Telugu. Participants were required to classify hate speech in three sub-tasks, each corresponding to one of these languages. The dataset was curated by collecting speech and corresponding text from YouTube videos. Various machine learning and deep learning-based models, including transformer-based architectures and multimodal frameworks, were employed by the participants. The submissions were evaluated using the macro F1 score. Experimental results underline the potential of multimodal approaches in advancing hate speech detection for low-resource languages. Team SSNTrio achieved the highest F1 score in Malayalam and Tamil of 0.7511 and 0.7332, respectively. Team lowes scored the best F1 score of 0.3817 in the Telugu sub-task.
Hydrangea@DravidianLanTech2025: Abusive language Identification from Tamil and Malayalam Text using Transformer Models
Shanmitha Thirumoorthy | Thenmozhi Durairaj | Ratnavel Rajalakshmi
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Shanmitha Thirumoorthy | Thenmozhi Durairaj | Ratnavel Rajalakshmi
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Abusive language toward women on the Internet has always been perceived as a danger to free speech and safe online spaces. In this paper, we discuss three transformer-based models - BERT, XLM-RoBERTa, and DistilBERT-in identifying gender-abusive comments in Tamil and Malayalam YouTube contents. We fine-tune and compare these models using a dataset provided by DravidianLangTech 2025 shared task for identifying the abusive content from social media. Compared to the models above, the results of XLM-RoBERTa are better and reached F1 scores of 0.7708 for Tamil and 0.6876 for Malayalam. BERT followed with scores of 0.7658 (Tamil) and 0.6671 (Malayalam). Of the DistilBERTs, performance was varyingly different for the different languages. A large difference in performance between the models, especially in the case of Malayalam, indicates that working in low-resource languages is difficult. The choice of a model is extremely critical in applying abusive language detection. The findings would be important information for effective content moderation systems in linguistically diverse contexts. In general, it would promote safe online spaces for women in South Indian language communities.
DLRG at BHASHA: Task 1 (IndicGEC): A Hybrid Neurosymbolic Approach for Tamil and Malayalam Grammatical Error Correction
Akshay Ramesh | Ratnavel Rajalakshmi
Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
Akshay Ramesh | Ratnavel Rajalakshmi
Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
Grammatical Error Correction (GEC) for low-resource Indic languages remains challenging due to limited annotated data and morphological complexity. We present a hybrid neurosymbolic GEC system that combines neural sequence-to-sequence models with explicit language-specific rule-based pattern matching. Our approach leverages parameter-efficient LoRA adaptation on aggressively augmented data to fine-tune pre-trained mT5 models, followed by learned correction rules through intelligent ensemble strategies. The proposed hybrid architecture achieved 85.34% GLEU for Tamil (Rank 8) and 95.06% GLEU for Malayalam (Rank 2) on the provided IndicGEC test sets, outperforming individual neural and rule-based approaches. The system incorporates conservative safety mechanisms to prevent catastrophic deletions and over-corrections, thus ensuring robustness and real-world applicability. Our work demonstrates that extremely low-resource GEC can be effectively addressed by combining neural generalization with symbolic precision.
2024
Findings of the First Shared Task on Offensive Span Identification from Code-Mixed Kannada-English Comments
Manikandan Ravikiran | Ratnavel Rajalakshmi | Bharathi Raja Chakravarthi | Anand Kumar Madasamy | Sajeetha Thavareesan
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Manikandan Ravikiran | Ratnavel Rajalakshmi | Bharathi Raja Chakravarthi | Anand Kumar Madasamy | Sajeetha Thavareesan
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Effectively managing offensive content is crucial on social media platforms to encourage positive online interactions. However, addressing offensive contents in code-mixed Dravidian languages faces challenges, as current moderation methods focus on flagging entire comments rather than pinpointing specific offensive segments. This limitation stems from a lack of annotated data and accessible systems designed to identify offensive language sections. To address this, our shared task presents a dataset comprising Kannada-English code-mixed social comments, encompassing offensive comments. This paper outlines the dataset, the utilized algorithms, and the results obtained by systems participating in this shared task.
DLRG-DravidianLangTech@EACL2024 : Combating Hate Speech in Telugu Code-mixed Text on Social Media
Ratnavel Rajalakshmi | Saptharishee M | Hareesh S | Gabriel R | Varsini Sr
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Ratnavel Rajalakshmi | Saptharishee M | Hareesh S | Gabriel R | Varsini Sr
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Detecting hate speech in code-mixed language is vital for a secure online space, curbing harmful content, promoting inclusive communication, and safeguarding users from discrimination. Despite the linguistic complexities of code-mixed languages, this study explores diverse pre-processing methods. It finds that the Transliteration method excels in handling linguistic variations. The research comprehensively investigates machine learning and deep learning approaches, namely Logistic Regression and Bi-directional Gated Recurrent Unit (Bi-GRU) models. These models achieved F1 scores of 0.68 and 0.70, respectively, contributing to ongoing efforts to combat hate speech in code-mixed languages and offering valuable insights for future research in this critical domain.
2022
Understanding the role of Emojis for emotion detection in Tamil
Ratnavel Rajalakshmi | Faerie Mattins R | Srivarshan Selvaraj | Antonette Shibani | Anand Kumar M | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Multimodal Machine Learning in Low-resource Languages
Ratnavel Rajalakshmi | Faerie Mattins R | Srivarshan Selvaraj | Antonette Shibani | Anand Kumar M | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Multimodal Machine Learning in Low-resource Languages
of expressing relevant idea through social media platforms and forums. At the same time, these memes are trolled by a person who tries to get identified from the other internet users like social media users, chat rooms and blogs. The memes contain both textual and visual information. Based on the content of memes, they are trolled in online community. There is no restriction for language usage in online media. The present work focuses on whether memes are trolled or not trolled. The proposed multi modal approach achieved considerably better weighted average F1 score of 0.5437 compared to Unimodal approaches. The other performance metrics like precision, recall, accuracy and macro average have also been studied to observe the proposed system.
Multimodal Code-Mixed Tamil Troll Meme Classification using Feature Fusion
Ramesh Kannan | Ratnavel Rajalakshmi
Proceedings of the First Workshop on Multimodal Machine Learning in Low-resource Languages
Ramesh Kannan | Ratnavel Rajalakshmi
Proceedings of the First Workshop on Multimodal Machine Learning in Low-resource Languages
Memes became an important way of expressing relevant idea through social media platforms and forums. At the same time, these memes are trolled by a person who tries to get identified from the other internet users like social media users, chat rooms and blogs. The memes contain both textual and visual information. Based on the content of memes, they are trolled in online community. There is no restriction for language usage in online media. The present work focuses on whether memes are trolled or not trolled. The proposed multi modal approach achieved considerably better weighted average F1 score of 0.5437 compared to Unimodal approaches. The other performance metrics like precision, recall, accuracy and macro average have also been studied to observe the proposed system.
DLRG@LT-EDI-ACL2022:Detecting signs of Depression from Social Media using XGBoost Method
Herbert Sharen | Ratnavel Rajalakshmi
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Herbert Sharen | Ratnavel Rajalakshmi
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Depression is linked to the development of dementia. Cognitive functions such as thinkingand remembering generally deteriorate in dementiapatients. Social media usage has beenincreased among the people in recent days. Thetechnology advancements help the communityto express their views publicly. Analysing thesigns of depression from texts has become animportant area of research now, as it helps toidentify this kind of mental disorders among thepeople from their social media posts. As part ofthe shared task on detecting signs of depressionfrom social media text, a dataset has been providedby the organizers (Sampath et al.). Weapplied different machine learning techniquessuch as Support Vector Machine, Random Forestand XGBoost classifier to classify the signsof depression. Experimental results revealedthat, the XGBoost model outperformed othermodels with the highest classification accuracyof 0.61% and an Macro F1 score of 0.54.
Findings of the Shared Task on Offensive Span Identification fromCode-Mixed Tamil-English Comments
Manikandan Ravikiran | Bharathi Raja Chakravarthi | Anand Kumar Madasamy | Sangeetha Sivanesan | Ratnavel Rajalakshmi | Sajeetha Thavareesan | Rahul Ponnusamy | Shankar Mahadevan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Manikandan Ravikiran | Bharathi Raja Chakravarthi | Anand Kumar Madasamy | Sangeetha Sivanesan | Ratnavel Rajalakshmi | Sajeetha Thavareesan | Rahul Ponnusamy | Shankar Mahadevan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in code-mixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems.
DLRG@TamilNLP-ACL2022: Offensive Span Identification in Tamil usingBiLSTM-CRF approach
Ratnavel Rajalakshmi | Mohit More | Bhamatipati Shrikriti | Gitansh Saharan | Hanchate Samyuktha | Sayantan Nandy
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Ratnavel Rajalakshmi | Mohit More | Bhamatipati Shrikriti | Gitansh Saharan | Hanchate Samyuktha | Sayantan Nandy
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Identifying offensive speech is an exciting andessential area of research, with ample tractionin recent times. This paper presents our sys-tem submission to the subtask 1, focusing onusing supervised approaches for extracting Of-fensive spans from code-mixed Tamil-Englishcomments. To identify offensive spans, wedeveloped the Bidirectional Long Short-TermMemory (BiLSTM) model with Glove Em-bedding. To this end, the developed systemachieved an overall F1 of 0.1728. Addition-ally, for comments with less than 30 characters,the developed system shows an F1 of 0.3890,competitive with other submissions.
DLRG@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil using Multilingual Transformer Models
Ratnavel Rajalakshmi | Ankita Duraphe | Antonette Shibani
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Ratnavel Rajalakshmi | Ankita Duraphe | Antonette Shibani
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Online Social Network has let people to connect and interact with each other. It does, however, also provide a platform for online abusers to propagate abusive content. The vast majority of abusive remarks are written in a multilingual style, which allows them to easily slip past internet inspection. This paper presents a system developed for the Shared Task on Abusive Comment Detection (Misogyny, Misandry, Homophobia, Transphobic, Xenophobia, CounterSpeech, Hope Speech) in Tamil DravidianLangTech@ACL 2022 to detect the abusive category of each comment. We approach the task with three methodologies - Machine Learning, Deep Learning and Transformer-based modeling, for two sets of data - Tamil and Tamil+English language dataset. The dataset used in our system can be accessed from the competition on CodaLab. For Machine Learning, eight algorithms were implemented, among which Random Forest gave the best result with Tamil+English dataset, with a weighted average F1-score of 0.78. For Deep Learning, Bi-Directional LSTM gave best result with pre-trained word embeddings. In Transformer-based modeling, we used IndicBERT and mBERT with fine-tuning, among which mBERT gave the best result for Tamil dataset with a weighted average F1-score of 0.7.
2021
DLRG@DravidianLangTech-EACL2021: Transformer based approachfor Offensive Language Identification on Code-Mixed Tamil
Ratnavel Rajalakshmi | Yashwant Reddy | Lokesh Kumar
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Ratnavel Rajalakshmi | Yashwant Reddy | Lokesh Kumar
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Internet advancements have made a huge impact on the communication pattern of people and their life style. People express their opinion on products, politics, movies etc. in social media. Even though, English is predominantly used, nowadays many people prefer to tweet in their native language and some- times by combining it with English. Sentiment analysis on such code-mixed tweets is challenging, due to large vocabulary, grammar and colloquial usage of many words. In this paper, the transformer based language model is applied to analyse the sentiment on Tanglish tweets, which is a combination of Tamil and English. This work has been submitted to the the shared task on DravidianLangTech- EACL2021. From the experimental results, it is shown that an F 1 score of 64% was achieved in detecting the hate speech in code-mixed Tamil-English tweets using bidirectional trans- former model.
Search
Fix author
Co-authors
- Bharathi Raja Chakravarthi 9
- Ramesh Kannan 5
- Saranya Rajiakodi 5
- Anand Kumar M 4
- Sajeetha Thavareesan 3
- Bharathi B 2
- Premjith B 2
- Shunmuga Priya Muthusamy Chinnan 2
- Thenmozhi Durairaj 2
- Jyothish Lal G 2
- Rajeswari Natarajan 2
- Rahul Ponnusamy 2
- Manikandan Ravikiran 2
- Antonette Shibani 2
- Gulisetty Abhinav 1
- Nahul Alaguraj 1
- Chinthala Bhuvanesh 1
- Dhivya Chinnappa 1
- Ankita Duraphe 1
- Sakshi Gupta 1
- Jananayagan 1
- Lokesh Kumar 1
- Saptharishee M 1
- Shankar Mahadevan 1
- Bitan Mallik 1
- Karthikeyan Manimaran 1
- Faerie Mattins R 1
- Mohit More 1
- Sripriya N 1
- Tanisha Nanda 1
- Sayantan Nandy 1
- Subalalitha Navaneethakrishnan 1
- Balasubramanian Palani 1
- Kathiravan Pannerselvam 1
- Ruba Priyadharshini 1
- Gabriel R 1
- Hanish Vigneshwar R 1
- Charmathi Rajkumar 1
- Akshay Ramesh 1
- Yashwant Reddy 1
- Hareesh S 1
- Suhasini S 1
- Gitansh Saharan 1
- Meetesh Saini 1
- Hanchate Samyuktha 1
- Mirudhula Sankar 1
- Srivarshan Selvaraj 1
- Kogilavani Shanmugavadivel 1
- Herbert Sharen 1
- Bhamatipati Shrikriti 1
- Bhuvaneswari Sivagnanam 1
- Sangeetha Sivanesan 1
- Varsini Sr 1
- Malliga Subramanian 1
- Shanmitha Thirumoorthy 1