Priyobroto Acharya
2025
JUNLP@LT-EDI-2025: Efficient Low-Rank Adaptation of Whisper for Inclusive Tamil Speech Recognition Targeting Vulnerable Populations
Priyobroto Acharya
|
Soham Chaudhuri
|
Sayan Das
|
Dipanjan Saha
|
Dipankar Das
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Speech recognition has received extensive research attention in recent years. It becomes much more challenging when the speaker’s age, gender and other factors introduce variations in the speech. In this work, we propose a fine-tuned automatic speech recognition model derived from OpenAI’s whisperlarge-v2. Though we experimented with both Whisper-large and Wav2vec2-XLSR-large, the reduced WER of whisper-large proved to be a superior model. We secured 4th rank in the LT-EDI-2025 shared task. Our implementation details and code are available at our GitHub repository1.
JU-NLP: Improving Low-Resource Indic Translation System with Efficient LoRA-Based Adaptation
Priyobroto Acharya
|
Haranath Mondal
|
Dipanjan Saha
|
Dipankar Das
|
Sivaji Bandyopadhyay
Proceedings of the Tenth Conference on Machine Translation
Low-resource Indic languages such as Assamese, Manipuri, Mizo, and Bodo face persistent challenges in NMT due to limited parallel data, diverse scripts, and complex morphology. We address these issues in the WMT $2025$ shared task by introducing a unified multilingual NMT framework that combines rigorous language-specific preprocessing with parameter-efficient adaptation of large-scale models. Our pipeline integrates the NLLB-$200$ and IndicTrans$2$ architectures, fine-tuned using LoRA and DoRA, reducing trainable parameters by over 90% without degrading translation quality. A comprehensive preprocessing suite, including Unicode normalization, semantic filtering, transliteration, and noise reduction, ensures high-quality inputs, while script-aware post-processing mitigates evaluation bias from orthographic mismatches. Experiments across English-Indic directions demonstrate that NLLB-$200$ achieves superior results for Assamese, Manipuri, and Mizo, whereas IndicTrans$2$ excels in English-Bodo. Evaluated using BLEU, chrF, METEOR, ROUGE-L, and TER, our approach yields consistent improvements over baselines, underscoring the effectiveness of combining efficient fine-tuning with linguistically informed preprocessing for low-resource Indic MT.
Search
Fix author
Co-authors
- Dipankar Das 2
- Dipanjan Saha 2
- Sivaji Bandyopadhyay 1
- Soham Chaudhuri 1
- Sayan Das 1
- show all...