Gulisetty Abhinav


2026

This paper describes our system developed for the shared task on Dialect Based Speech Recognition and Classification in Tamil at DravidianLangTech@ACL 2026. We participated in both Subtask 1 (Dialect Identification) and Subtask 2 (Dialectal ASR). Our approach leverages a single Tamil-adapted Whisper Medium model as a unified foundation for both tasks. For dialect classification, we have used the Whisper encoder as a feature extractor by discarding the decoder, applying mean pooling over the temporal dimension, and fine-tuning the full encoder with a lightweight classification head, achieving 73.4% accuracy on the test set. For dialectal ASR, we apply Low-Rank Adaptation (LoRA) to the full encoder-decoder architecture with SpecAugment-based data augmentation, achieving a Word Error Rate (WER) of 0.55 on the test set. Our experiments reveal that unfreezing the pre-trained encoder is critical for dialect discrimination, boosting accuracy from 52.78% (frozen) to 73.4% (unfrozen). The code is publicly available at https://github.com/DLRG-VIT/DravidianLangTech2026