Ganesh Sundhar S
2026
Wise@DravidianLangTech 2026: Dialect-Aware Tamil Speech Classification and Recognition via Cross-Pipeline Embedding Transfer
Ganesh Sundhar S | Hari Krishnan N | Gnanasabesan G | Suriya KP | Jyothish Lal G
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Ganesh Sundhar S | Hari Krishnan N | Gnanasabesan G | Suriya KP | Jyothish Lal G
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper presents the **Wise** system for the shared task on dialect-based speech processing in Tamil, addressing two subtasks: **(1) four-way dialect region classification** (Northern, Southern, Western, Central), and **(2) dialectal Tamil ASR**. All audio is preprocessed using loudness normalization followed by neural denoising to ensure consistent audio quality for downstream models. For classification, we experiment with different model variants combining multilingual and Tamil-pretrained **Wav2Vec2** backbones with five temporal pooling strategies under frozen and partial fine-tuning settings. Our best configuration, i.e., learned attentive pooling with partial fine-tuning and a differentially trained MLP head, achieves a macro F1 of **0.79**, securing **1st place** with a margin of **0.26** points. For ASR, we propose two novel **dialect-conditioned Whisper** architectures—residual injection and cross-attention—that inject dialect embeddings from the trained classifier into the ASR pipeline. In addition, we evaluate a vanilla Whisper-Tamil fine-tuned baseline. The best model achieved a **WER of 0.90**, securing **8th place** in the shared task.
SYNAPSE@DravidianLangTech 2026: Multi-Level Political Meme Classification for Tamil and Malayalam
Suriya KP | Durai Singh K | Gnanasabesan G | Ganesh Sundhar S | Hari Krishnan N | Jyothish Lal G
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Suriya KP | Durai Singh K | Gnanasabesan G | Ganesh Sundhar S | Hari Krishnan N | Jyothish Lal G
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Political memes in Tamil and Malayalampresent unique multimodal challenges for automated under-standing, combining visual context with code-mixed, cultur-ally grounded text. We present SYNAPSE, our system forthe DravidianLangTech@ACL 2026 shared task on multi-levelpolitical meme classification. The task requires hierarchicalclassification of memes along two levels: Level 1 identifies thepolitical stance (Support/Praise vs. Troll/Oppose), and Level 2identifies the target (individual person vs. party). Our approachfine-tunes the Qwen3-VL-2B-Instruct vision-language modelusing parameter-efficient LoRA adapters on task-specific mul-timodal data, with structured output prompting for hierarchi-cal label prediction. We report results for both Tamil andMalayalam subtracks. For Malayalam, our system achievesa Level 1 F1 of 0.9200 and Level 2 F1 of 0.4256 (Avg-F1:0.6728, Rank 5). For Tamil, our system achieves a Level 1 F1of 0.7840 and Level 2 F1 of 0.4885 (Avg-F1: 0.6362, Rank 14).
2025
Wise@LT-EDI-2025: Combining Classical and Neural Representations with Multi-scale Ensemble Learning for Code-mixed Hate Speech Detection
Ganesh Sundhar S | Durai Singh K | Gnanasabesan G | Hari Krishnan N | Mc Dhanush
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Ganesh Sundhar S | Durai Singh K | Gnanasabesan G | Hari Krishnan N | Mc Dhanush
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Detecting hate speech targeting caste and migration communities in code-mixed Tamil-English social media content is challenging due to limited resources and socio-cultural complexities. This paper proposes a multi-scale hybrid architecture combining classical and neural representations with hierarchical ensemble learning. We employ advanced preprocessing including transliteration and character repetition removal, then extract features using classical TF-IDF vectors at multiple scales (512, 1024, 2048) processed through linear layers, alongside contextual embeddings from five transformer models-Google BERT, XLM-RoBERTa (Base and Large), SeanBenhur BERT, and IndicBERT. These concatenated representations encode both statistical and contextual information, which are input to multiple ML classification heads (Random Forest, SVM, etc). A three-level hierarchical ensemble strategy combines predictions across classifiers, transformer-TF-IDF combinations, and dimensional scales for enhanced robustness. Our method scored an F1-score of 0.818, ranking 3rd in the LT-EDI-2025 shared task, showing the efficacy of blending classical and neural methods with multi-level ensemble learning for hate speech detection in low-resource languages.
CrewX@LT-EDI-2025: Transformer-Based Tamil ASR Fine-Tuning with AVMD Denoising and GRU-VAD for Enhanced Transcription Accuracy
Ganesh Sundhar S | Hari Krishnan N | Arun Prasad T D | Shruthikaa V | Jyothish Lal G
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Ganesh Sundhar S | Hari Krishnan N | Arun Prasad T D | Shruthikaa V | Jyothish Lal G
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
This research presents an improved Tamil Automatic Speech Recognition (ASR) system designed to enhance accessibility for elderly and transgender populations by addressing unique language challenges. We address the challenges of Tamil ASR—including limited high-quality curated datasets, unique phonetic characteristics, and word-merging tendencies—through a comprehensive pipeline. Our methodology integrates Adaptive Variational Mode Decomposition (AVMD) for selective noise reduction based on signal characteristics, Silero Voice Activity Detection (VAD) with GRU architecture to eliminate non-speech segments, and fine-tuning of OpenAI’s Whisper model optimized for Tamil transcription. The system employs beam search decoding during inference to further improve accuracy. Our approach achieved state-of-the-art performance with a Word Error Rate (WER) of 31.9,winning first place in the LT-EDI 2025 shared task.