Shunmuga Priya Muthusamy Chinnan

2026

GYAAN-SAHIT: A Persona-Driven Multi-Agent Framework for Caste-Based Hate Speech Detection
Sakshi Gupta | Shunmuga Priya Muthusamy Chinnan | Saranya Rajiakodi | Ratnavel Rajalakshmi | Bharathi Raja Chakravarthi
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion

Social media has amplified public discourse in India while perpetuating caste-based hierarchies. Despite legal protections, caste-based hate speech continues to propagate across digital platforms through culturally embedded expressions that conventional classifiers often struggle to interpret. We propose GYAAN-SAHIT, a knowledge-driven multi-agent framework that addresses this problem through structured debate-based classification. Each agent adopts a distinct ideological and socio-cultural persona, engaging in multi-turn argumentation to reason over context, subtext, and intent. A critic agent then evaluates the coherence of the debate before producing the final classification. The framework further integrates Hindi hate lexicons to ground its reasoning in linguistic and cultural specificity. Experiments show that GYAAN-SAHIT achieves improvement in performance while generating culturally grounded explanations, demonstrating the effectiveness of persona-based multi-agent reasoning for hate speech detection in low-resource and socially complex environments.

pdf bib abs

We investigate the role of large language models (LLMs) in promoting gender-inclusive language by evaluating their ability to rewrite biased text and generate counterfactual narratives across multiple languages. We introduce a shared task with two subtasks: gender-inclusive rewriting and counterfactual generation. The task covers five languages English, German, Spanish, Tamil, and Kannada reflecting diverse grammatical gender systems and sociocultural contexts. We release curated word-level and sentence-level datasets to support controlled inclusive generation. A total of 50 teams registered for the shared task, and around 8 teams submitted results. Submissions are evaluated using a hybrid framework combining rubric-based automatic scoring with expert human judgment. Finally, we provide an overview of participating systems and discuss key findings and challenges observed across languages.

pdf bib abs

Findings in Tamil Dialect Speech Recognition and Classification
Bharathi B | Bharathi Raja Chakravarthi | Shunmuga Priya Muthusamy Chinnan | Saranya S | Suhasini S
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

As part of DravidianLangTech-2026, we provide a overview of Shared Task on Dialect-based Speech Recognition and Classification in Tamil. Creating reliable system for Tamil dialect identification from audio signals and dialect-aware Automatic Speech Recognition (ASR) is the main goal of the joint work. Dialect-based Tamil Speech Recognition and Tamil Dialect Classification from Speech are the two subtasks that make up the task. 5,134 audio recordings in four Tamil dialects: Southern, Northern, Western, and Central-spanning 9 hours and 22 minutes make up the training dataset. There are 579 audio samples in the test set, totaling almost two hours in length. The shared task involved 17 teams in total. For speech recognition and dialect classification, the top-performing system obtained a Word Error Rate (WER) of 0.51 and a macro F1-score of 0.79, respectively. The findings emphasize the difficulties in understanding Tamil speech due to dialectal diversity and set solid foundations for further study on low-resource dialect-aware ASR systems.

pdf bib abs

Overview of the Shared Task on Multilevel Political Meme Classification in Tamil and Malayalam
Saranya Rajiakodi | Shunmuga Priya Muthusamy Chinnan | Premjith B | Subalalitha CN | Rahul Ponnusamy | Anshid K A | Bhuvaneswari Sivagnanam | Bharathi Raja Chakravarthi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper presents an overview of the Multi-Level Political Meme Classification shared task conducted at DravidianLangTech–ACL 2026. The task introduces a hierarchical two-level classification framework for Tamil and Malayalam political memes: Level 1 focuses on stance detection (Support/Praise vs. Troll/Oppose), while Level 2 identifies the political target (individual or party), conditioned on the predicted stance. The dataset was curated from social media platforms and manually annotated with strong inter-annotator agreement. A total of 64 teams registered and 19 teams submitted their results using diverse multimodal approaches combining transformer-based text encoders, vision models, OCR pipelines, and hierarchical architectures. Results show that stance detection achieves high macro-F1 scores across both languages, whereas target identification remains more challenging, particularly in Malayalam. The findings highlight the importance of multimodal fusion, hierarchical reasoning, and robustness to OCR noise and class imbalance in political meme analysis.

pdf bib abs

From Comments to Harm: A Findings Report on Abusive Tamil Text Targeting Women on Social Media Shared Task
Bhuvaneswari Sivagnanam | Kathiravan Pannerselvam | Jananayagan | Charmathi Rajkumar | Ramesh Kannan R | Ratnavel Rajalakshmi | Shunmuga Priya Muthusamy Chinnan | Saranya Rajiakodi | Bharathi Raja Chakravarthi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper presents an overview of the second shared task on Abusive Tamil Text Targeting Women on Social Media as a binary classification problem (abusive vs. non-abusive). We release a dataset of Tamil YouTube comments and evaluate submissions using macro-F1 to encourage balanced performance in a noisy, low-resource setting. There are 89 teams registered for this task and 24 teams submitted the results. The approaches used by the teams includes transformer fine-tuning, heterogeneous ensembles, classical baselines, and large language models using prompting and LoRA. Results show that the best-performing system scored 0.8297 macro-F1 and many submissions are around 0.79-0.81. Across submissions, transformer fine-tuning with domain-aligned encoders is consistently strong, while additional gains are frequently associated with Tamil-aware normalization and macro-F1-oriented calibration such as class-weighted learning and validation-based threshold tuning. Overall, the findings highlights the importance of language-aware preprocessing and careful decision calibration for reliable moderation of women-targeted abusive Tamil social media text.Disclaimer: This paper (including figures and examples) may contain offensive or harmful language, including abusive content targeting women. All such text is presented solely for research and educational purposes and it does not reflect the author’s views. Reader discretion is advised.

2025

pdf bib abs

Findings of the Shared Task Caste and Migration Hate Speech Detection
Saranya Rajiakodi | Bharathi Raja Chakravarthi | Rahul Ponnusamy | Shunmuga Priya Muthusamy Chinnan | Prasanna Kumar Kumaresan | Sathiyaraj Thangasamy | Bhuvaneswari Sivagnanam | Balasubramanian Palani | Kogilavani Shanmugavadivel | Abirami Murugappan | Charmathi Rajkumar
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

Hate speech targeting caste and migration communities is a growing concern in online platforms, particularly in linguistically diverse regions. By focusing on Tamil language text content, this task provides a unique opportunity to tackle caste or migration related hate speech detection in a low resource language Tamil, contributing to a safer digital space. We present the results and main findings of the shared task caste and migration hate speech detection. The task is a binary classification determining whether a text is caste/migration related hate speech or not. The task attracted 17 participating teams, experimenting with a wide range of methodologies from traditional machine learning to advanced multilingual transformers. The top performing system achieved a macro F1-score of 0.88105, enhancing an ensemble of fine-tuned transformer models including XLM-R and MuRIL. Our analysis highlights the effectiveness of multilingual transformers in low resource, ensemble learning, and culturally informed socio political context based techniques.

pdf bib abs

Findings of the Shared Task Multilingual Bias and Propaganda Annotation in Political Discourse
Shunmuga Priya Muthusamy Chinnan | Bharathi Raja Chakravarthi | Meghann Drury-Grogan | Senthil Kumar B | Saranya Rajiakodi | Angel Deborah S
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

The Multilingual Bias and Propaganda Annotation task focuses on annotating biased and propagandist content in political discourse across English and Tamil. This paper presents the findings of the shared task on bias and propaganda annotation task. This task involves two sub tasks, one in English and another in Tamil, both of which are annotation task where a text comment is to be labeled. With a particular emphasis on polarizing policy debates such as the US Gender Policy and India’s Three Language Policy, this shared task invites participants to build annotation systems capable of labeling textual bias and propaganda. The dataset was curated by collecting comments from YouTube videos. Our curated dataset consists of 13,010 English sentences on US Gender Policy, Russia-Ukraine War and 5,880 Tamil sentences on Three Language Policy. Participants were instructed to annotate following the guidelines at sentence level with the bias labels that are fine-grained, domain specific and 4 propaganda labels. Participants were encouraged to leverage existing tools or develop novel approaches to perform fine-grained annotations that capture the complex socio-political nuances present in the data.

pdf bib abs

Findings of the Shared Task on Misogyny Meme Detection: DravidianLangTech@NAACL 2025
Bharathi Raja Chakravarthi | Rahul Ponnusamy | Saranya Rajiakodi | Shunmuga Priya Muthusamy Chinnan | Paul Buitelaar | Bhuvaneswari Sivagnanam | Anshid Kizhakkeparambil
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The rapid expansion of social media has facilitated communication but also enabled the spread of misogynistic memes, reinforcing gender stereotypes and toxic online environments. Detecting such content is challenging due to the multimodal nature of memes, where meaning emerges from the interplay of text and images. The Misogyny Meme Detection shared task at DravidianLangTech@NAACL 2025 focused on Tamil and Malayalam, encouraging the development of multimodal approaches. With 114 teams registered and 23 submitting predictions, participants leveraged various pretrained language models and vision models through fusion techniques. The best models achieved high macro F1 scores (0.83682 for Tamil, 0.87631 for Malayalam), highlighting the effectiveness of multimodal learning. Despite these advances, challenges such as bias in the data set, class imbalance, and cultural variations persist. Future research should refine multimodal detection methods to improve accuracy and adaptability, fostering safer and more inclusive online spaces.

pdf bib abs

This overview paper presents the findings of the Shared Task on Abusive Tamil and Malayalam Text Targeting Women on Social Media, organized as part of DravidianLangTech@NAACL 2025. The task aimed to encourage the development of robust systems to detectabusive content targeting women in Tamil and Malayalam, two low-resource Dravidian languages. Participants were provided with annotated datasets containing abusive and nonabusive text curated from YouTube comments. We present an overview of the approaches and analyse the results of the shared task submissions. We believe the findings presented in this paper will be useful to researchers working in Dravidian language technology.