Abstract
Research on Deep learning-based Text-toSpeech (TTS) systems has gained increasing popularity in low-resource languages as this approach is not only computationally robust but also has the capability to produce state-ofthe-art results. However, these approaches are yet to be significantly explored for the Nepali language, primarily because of the lack of adequate size datasets and secondarily because of the relatively sophisticated computing resources they demand. This paper explores the FastPitch acoustic model with HiFi-GAN vocoder for the Nepali language. We trained the acoustic model with two datasets, OpenSLR and a dataset prepared jointly by the Information and Language Processing Research Lab (ILPRL) and the Nepal Association of the Blind (NAB), to be further referred to as the ILPRLNAB dataset. We achieved a Mean Opinion Score (MOS) of 3.70 and 3.40 respectively for the same model with different datasets. The synthesized speech produced by the model was found to be quite natural and of good quality.- Anthology ID:
- 2023.icon-1.64
- Volume:
- Proceedings of the 20th International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2023
- Address:
- Goa University, Goa, India
- Editors:
- Jyoti D. Pawar, Sobha Lalitha Devi
- Venue:
- ICON
- SIG:
- SIGLEX
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 651–656
- Language:
- URL:
- https://aclanthology.org/2023.icon-1.64
- DOI:
- Cite (ACL):
- Ishan Dongol and Bal Krishna Bal. 2023. Transformer-based Nepali Text-to-Speech. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 651–656, Goa University, Goa, India. NLP Association of India (NLPAI).
- Cite (Informal):
- Transformer-based Nepali Text-to-Speech (Dongol & Bal, ICON 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.icon-1.64.pdf