Shenran Wang
2026
Exploring Cross-Lingual Voice Conversion Methods for Anonymizing Low-Resource Text-to-Speech
Shenran Wang | Aidan Pine | Mengzhe Geng
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Shenran Wang | Aidan Pine | Mengzhe Geng
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
We describe and compare multiple approaches for using voice conversion techniques to mask speaker identities in low-resource text-to-speech. We build and evaluate speaker-anonymized text-to-speech systems for two Canadian Indigenous languages, nêhiyawêwin and SENĆOŦEN, and show that cross-lingual speaker transfer via multilingual training with English data produces the most consistent results across both languages. Our research also underscores the need for better evaluation metrics tailored to cross-lingual voice conversion. Our code can be found at https://github.com/EveryVoiceTTS/Speaker_Anonymization_StyleTTS2
2025
Developing multilingual speech synthesis system for Ojibwe, Mi’kmaq, and Maliseet
Shenran Wang | Changbing Yang | Michael l Parkhill | Chad Quinn | Christopher Hammerly | Jian Zhu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Shenran Wang | Changbing Yang | Michael l Parkhill | Chad Quinn | Christopher Hammerly | Jian Zhu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
We present lightweight flow matching multilingual text-to-speech (TTS) systems for Ojibwe, Mi’kmaq, and Maliseet, three Indigenous languages in North America. Our results show that training a multilingual TTS model on three typologically similar languages can improve the performance over monolingual models, especially when data are scarce. Attention-free architectures are highly competitive with self-attention architecture with higher memory efficiency. Our research provides technical development to language revitalization for low-resource languages but also highlights the cultural gap in human evaluation protocols, calling for a more community-centered approach to human evaluation.