Ganesh Dhakal Chhetri


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Impacts of Vocoder Selection on Tacotron-based Nepali Text-To-Speech Synthesis
Ganesh Dhakal Chhetri | Kiran Chandra Dahal | Prakash Poudyal
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)

Text-to-speech (TTS) technology enhances human-computer interaction and increases content accessibility. Tacotron and other deep learning models have enhanced the naturalness of text-to-speech systems. The vocoder, which transforms mel-spectrograms into audio waveforms, significantly influences voice quality. This study evaluates Tacotron2 vocoders for Nepali text-to speech synthesis. While English language vocoders have been thoroughly examined, Nepali language vocoders remain underexplored. The study utilizes the WaveNet and MelGAN vocoders to generate speech from mel-spectrograms produced by Tacotron2 for Nepali text. In order to assess the quality of voice synthesis, this paper study the mel-cepstral distortion (MCD) and Mean Opinion Score (MOS) for speech produced by both vocoders. The comparative investigation of the Tacotron2 + MelGAN and Tacotron2 + WaveNet models, utilizing the Nepali OpenSLR and News male voice datasets, consistently reveals the advantage of Tacotron2 + MelGAN in terms of naturalness and accuracy. The Tacotron2 + MelGAN model achieved an average MOS score of 4.245 on the Nepali OpenSLR dataset and 2.885 on the male voice dataset.