Shenran Wang
2025
Developing multilingual speech synthesis system for Ojibwe, Mi’kmaq, and Maliseet
Shenran Wang
|
Changbing Yang
|
Michael l Parkhill
|
Chad Quinn
|
Christopher Hammerly
|
Jian Zhu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
We present lightweight flow matching multilingual text-to-speech (TTS) systems for Ojibwe, Mi’kmaq, and Maliseet, three Indigenous languages in North America. Our results show that training a multilingual TTS model on three typologically similar languages can improve the performance over monolingual models, especially when data are scarce. Attention-free architectures are highly competitive with self-attention architecture with higher memory efficiency. Our research provides technical development to language revitalization for low-resource languages but also highlights the cultural gap in human evaluation protocols, calling for a more community-centered approach to human evaluation.