Rupak Raj Ghimire
2023
Pronunciation-Aware Syllable Tokenizer for Nepali Automatic Speech Recognition System
Rupak Raj Ghimire
|
Bal Krishna Bal
|
Balaram Prasain
|
Prakash Poudyal
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
The Automatic Speech Recognition (ASR) has come up with significant advancements over the course of several decades, transitioning from a rule-based method to a statistical approach, and ultimately to the use of end-to-end (E2E) frameworks. This phenomenon continues with the progression of machine learning and deep learning methodologies. The E2E approach for ASR has demonstrated predominant success in the case of resourceful languages with larger annotated corpus. However, the accuracy is quite low for low-resourced languages such as Nepali. In this regard, language-specific tools such as tokenizers seem to play a vital role in improving the performance of the E2E model for low-resourced languages like Nepali. In this paper, we propose a pronunciationaware syllable tokenizer for the Nepali language which improves the results of the E2E model. Our experiment confirm that the introduction of the proposed tokenizer yields better performance with the Character Error Rate (CER) 8.09% compared to other language-independent tokenizers.
Active Learning Approach for Fine-Tuning Pre-Trained ASR Model for a Low-Resourced Language: A Case Study of Nepali
Rupak Raj Ghimire
|
Bal Krishna Bal
|
Prakash Poudyal
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Fine tuning of the pre-trained language model is a technique which can be used to enhance the technologies of low-resourced languages. The unsupervised approach can fine-tune any pre-trained model with minimum or even no language-specific resources. It is highly advantageous, particularly for languages that possess limited computational resources. We present a novel approach for fine-tuning a pre-trained Automatic Speech Recognition (ASR) model that is suitable for low resource languages. Our methods involves iterative fine-tuning of pre-trained ASR model. mms-1b is selected as the pretrained seed model for fine-tuning. We take the Nepali language as a case study for this research work. Our approach achieved a CER of 6.77%, outperforming all previously recorded CER values for the Nepali ASR Systems.