Kishore Shankar S

2026

IIITK_SpeechScape@DravidianLangTech 2026: Dialect based speech recognition and classification using Speech Foundation Models and Deep Learning Techniques
G Srishtik Sekar | Harissh Ragav Dhamodaran | Kishore Shankar S | Balasubramanian Palani | R Tharaniya Sairaj
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Dialectal variation poses a significant challenge to Automatic Speech Recognition (ASR), particularly for low resource morphologically rich languages such as Tamil. Although widely spoken in India, Sri Lanka, and the global diaspora, Tamil exhibits substantial phonetic, lexical, and prosodic variation across dialects, complicating both dialect classification and speech recognition. In this work, we address these tasks within a unified framework.We evaluate state-of-the-art models for dialect classification, including Whisper, CLDNN, wav2vec, and wavLM, and for ASR, Whisper and a zero-shot Conformer. Among them, Whisper achieves the best performance, obtaining a macro F1-score of 0.46 for dialect classification and a word error rate of 0.57 for ASR.These results highlight the strong generalization capability of transformer-based foundation models across dialects and languages. The code is publicly available in github for research purpose.

Co-authors

Venues

Fix author