Sunil Regmi


2021

pdf
An End-to-End Speech Recognition for the Nepali Language
Sunil Regmi | Bal Krishna Bal
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

In this era of AI and Deep Learning, Speech Recognition has achieved fairly good levels of accuracy and is bound to change the way humans interact with computers, which happens mostly through texts today. Most of the speech recognition systems for the Nepali language to date use conventional approaches which involve separately trained acoustic, pronunciation and language model components. Creating a pronunciation lexicon from scratch and defining phoneme sets for the language requires expert knowledge, and at the same time is time-consuming. In this work, we present an End-to-End ASR approach, which uses a joint CTC- attention-based encoder-decoder and a Recurrent Neural Network based language modeling which eliminates the need of creating a pronunciation lexicon from scratch. ESPnet toolkit which uses Kaldi Style of data preparation is the framework used for this work. The speech and transcription data used for this research is freely available on the Open Speech and Language Resources (OpenSLR). We use about 159k transcribed speech data to train the speech recognition model which currently recognizes speech input with the CER of 10.3%.
Search
Co-authors
Venues