Abstract
This research introduces a comprehensive Bahasa text-to-speech (TTS) dataset and a novel TTS model, EnGen-TTS, designed to enhance the quality and versatility of synthetic speech in the Bahasa language. The dataset, spanning 55.00 hours and 52K audio recordings, integrates diverse textual sources, ensuring linguistic richness. A meticulous recording setup captures the nuances of Bahasa phonetics, employing professional equipment to ensure high-fidelity audio samples. Statistical analysis reveals the dataset’s scale and diversity, laying the foundation for model training and evaluation. The proposed EnGen-TTS model performs better than established baselines, achieving a Mean Opinion Score (MOS) of 4.45 ± 0.13. Additionally, our investigation on real-time factor and model size highlights EnGen-TTS as a compelling choice, with efficient performance. This research marks a significant advancement in Bahasa TTS technology, with implications for diverse language applications.- Anthology ID:
- 2024.findings-emnlp.154
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2731–2741
- Language:
- URL:
- https://aclanthology.org/2024.findings-emnlp.154
- DOI:
- 10.18653/v1/2024.findings-emnlp.154
- Cite (ACL):
- Onkar Kishor Susladkar, Vishesh Tripathi, and Biddwan Ahmed. 2024. Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS.. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2731–2741, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS. (Susladkar et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.findings-emnlp.154.pdf