Avinash Kumar Sharma


2025

pdf bib
Fine-tuning Whisper Tiny for Swahili ASR: Challenges and Recommendations for Low-Resource Speech Recognition
Avinash Kumar Sharma | Manas Pandya | Arpit Shukla
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)

Automatic Speech Recognition (ASR) technologies have seen significant advancements, yet many widely spoken languages remain underrepresented. This paper explores the fine-tuning of OpenAI’s Whisper Tiny model (39M parameters) for Swahili, a lingua franca for over 100 million people across East Africa. Using a dataset of 5,520 Swahili audio samples, we analyze the model’s performance, error patterns, and limitations after fine-tuning. Our results demonstrate the potential of fine-tuning for improving transcription accuracy, while also highlighting persistent challenges such as phonetic misinterpretations, named entity recognition failures, and difficulties with morphologically complex words. We provide recommendations for improving Swahili ASR, including scaling to larger model variants, architectural adaptations for agglutinative languages, and data enhancement strategies. This work contributes to the growing body of research on adapting pre-trained multilingual ASR systems to low-resource languages, emphasizing the need for approaches that account for the unique linguistic features of Bantu languages.

pdf bib
Swahili News Classification: Performance, Challenges, and Explainability Across ML, DL, and Transformers
Manas Pandya | Avinash Kumar Sharma | Arpit Shukla
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)

In this paper, we propose a comprehensive framework for the classification of Swahili news articles using a combination of classical machine learning techniques, deep neural networks, and transformer-based models. By balancing two diverse datasets sourced from Harvard Dataverse and Kaggle, our approach addresses the inherent challenges of imbalanced data in low-resource languages. Our experiments demonstrate the effectiveness of the proposed methodology and set the stage for further advances in Swahili natural language processing.