Pronunciation-Aware Syllable Tokenizer for Nepali Automatic Speech Recognition System
Rupak Raj Ghimire, Bal Krishna Bal, Balaram Prasain, Prakash Poudyal
Abstract
The Automatic Speech Recognition (ASR) has come up with significant advancements over the course of several decades, transitioning from a rule-based method to a statistical approach, and ultimately to the use of end-to-end (E2E) frameworks. This phenomenon continues with the progression of machine learning and deep learning methodologies. The E2E approach for ASR has demonstrated predominant success in the case of resourceful languages with larger annotated corpus. However, the accuracy is quite low for low-resourced languages such as Nepali. In this regard, language-specific tools such as tokenizers seem to play a vital role in improving the performance of the E2E model for low-resourced languages like Nepali. In this paper, we propose a pronunciationaware syllable tokenizer for the Nepali language which improves the results of the E2E model. Our experiment confirm that the introduction of the proposed tokenizer yields better performance with the Character Error Rate (CER) 8.09% compared to other language-independent tokenizers.- Anthology ID:
- 2023.icon-1.4
- Volume:
- Proceedings of the 20th International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2023
- Address:
- Goa University, Goa, India
- Editors:
- Jyoti D. Pawar, Sobha Lalitha Devi
- Venue:
- ICON
- SIG:
- SIGLEX
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 36–43
- Language:
- URL:
- https://aclanthology.org/2023.icon-1.4
- DOI:
- Cite (ACL):
- Rupak Raj Ghimire, Bal Krishna Bal, Balaram Prasain, and Prakash Poudyal. 2023. Pronunciation-Aware Syllable Tokenizer for Nepali Automatic Speech Recognition System. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 36–43, Goa University, Goa, India. NLP Association of India (NLPAI).
- Cite (Informal):
- Pronunciation-Aware Syllable Tokenizer for Nepali Automatic Speech Recognition System (Ghimire et al., ICON 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.icon-1.4.pdf