Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai
Yizhong Geng, Jizhuo Xu, Zeyu Liang, Jinghan Yang, Xiaoyi Shi, Xiaoyu Shen
Abstract
Text-to-speech (TTS) technology has achieved impressive results for widely spoken languages, yet many under-resourced languages remain challenged by limited data and linguistic complexities. In this paper, we present a novel methodology that integrates a data-optimized framework with an advanced acoustic model to build high-quality TTS systems for low-resource scenarios. We demonstrate the effectiveness of our approach using Thai as an illustrative case, where intricate phonetic rules and sparse resources are effectively addressed. Our method enables zero-shot voice cloning and improved performance across diverse client applications, ranging from finance to healthcare, education, and law. Extensive evaluations—both subjective and objective—confirm that our model meets state-of-the-art standards, offering a scalable solution for TTS production in data-limited settings, with significant implications for broader industry adoption and multilingual accessibility. All demos are available in https://luoji.cn/static/thai/demo.html.- Anthology ID:
- 2025.acl-industry.42
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Georg Rehm, Yunyao Li
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 593–604
- Language:
- URL:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.acl-industry.42/
- DOI:
- 10.18653/v1/2025.acl-industry.42
- Cite (ACL):
- Yizhong Geng, Jizhuo Xu, Zeyu Liang, Jinghan Yang, Xiaoyi Shi, and Xiaoyu Shen. 2025. Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 593–604, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai (Geng et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.acl-industry.42.pdf