TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces

Pasindu Udawatta; Jesin James; Balamurali B T; Catherine Inez Watson; Ake Nicholas; Binu Nisal Abeysinghe

TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces

Pasindu Udawatta, Jesin James, Balamurali B T, Catherine Inez Watson, Ake Nicholas, Binu Nisal Abeysinghe

Abstract

In text-to-speech (TTS) model training, the saturation of the loss curve indicates how well a model learns the characteristics of the training dataset. But it does not reveal the linguistic properties learned by the model. Existing TTS approaches miss the potential to incorporate linguistic insights into model training. We introduce TTSVowelViz, a novel tool that visualises static and dynamic vowel spaces during model training, bridging linguistic knowledge and TTS model development. It helps identify which vowel sounds are accurately learned and how the vowel spaces are evolved during training. To assess TTSVowelViz, we fine-tuned a TTS model from General American English to New Zealand English and conducted a perception test. Our results show that the formants of specific vowels in the vowel spaces generated by TTSVowelViz align with human perception, effectively visualising the perceived accent shift. This work highlights vowel space visualisation as a valuable interpretability tool for TTS training.

Anthology ID:: 2026.lrec-main.375
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 4778–4786
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.375/
DOI:
Bibkey:
Cite (ACL):: Pasindu Udawatta, Jesin James, Balamurali B T, Catherine Inez Watson, Ake Nicholas, and Binu Nisal Abeysinghe. 2026. TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces. International Conference on Language Resources and Evaluation, main:4778–4786.
Cite (Informal):: TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces (Udawatta et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.375.pdf
Optionalsupplementarymaterial:: 2026.lrec-main.375.OptionalSupplementaryMaterial.zip

PDF Cite Search Optionalsupplementarymaterial Fix data