TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces
Pasindu Udawatta, Jesin James, Balamurali B T, Catherine Inez Watson, Ake Nicholas, Binu Nisal Abeysinghe
Abstract
In text-to-speech (TTS) model training, the saturation of the loss curve indicates how well a model learns the characteristics of the training dataset. But it does not reveal the linguistic properties learned by the model. Existing TTS approaches miss the potential to incorporate linguistic insights into model training. We introduce TTSVowelViz, a novel tool that visualises static and dynamic vowel spaces during model training, bridging linguistic knowledge and TTS model development. It helps identify which vowel sounds are accurately learned and how the vowel spaces are evolved during training. To assess TTSVowelViz, we fine-tuned a TTS model from General American English to New Zealand English and conducted a perception test. Our results show that the formants of specific vowels in the vowel spaces generated by TTSVowelViz align with human perception, effectively visualising the perceived accent shift. This work highlights vowel space visualisation as a valuable interpretability tool for TTS training.- Anthology ID:
- 2026.lrec-main.375
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 4778–4786
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.375/
- DOI:
- Cite (ACL):
- Pasindu Udawatta, Jesin James, Balamurali B T, Catherine Inez Watson, Ake Nicholas, and Binu Nisal Abeysinghe. 2026. TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces. International Conference on Language Resources and Evaluation, main:4778–4786.
- Cite (Informal):
- TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces (Udawatta et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.375.pdf