Building Text-to-Speech Systems for Resource Poor Languages

Nur-Hana Samsudin, Mark Lee


Abstract
This paper describes research on building text-to-speech synthesis systems (TTS) for resource poor languages using available resources from other languages and describes our general approach to building cross-linguistic polyglot TTS. Our approach involves three main steps: language clustering, grapheme to phoneme mapping and prosody modelling. We have tested the mapping of phonemes from German to English and from Indonesian to Spanish. We have also constructed three prosody representations for different language characteristics. For evaluation we have developed an English TTS based on German data, and a Spanish TTS based on Indonesian data and compared their performance against pre-existing monolingual TTSs. Since our motivation is to develop speech synthesis for resource poor languages, we have also developed three TTS for Iban, an Austronesian language with practically no available language resources, using Malay, Indonesian and Spanish resources.
Anthology ID:
L12-1638
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3327–3334
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1070_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Nur-Hana Samsudin and Mark Lee. 2012. Building Text-to-Speech Systems for Resource Poor Languages. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3327–3334, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Building Text-to-Speech Systems for Resource Poor Languages (Samsudin & Lee, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1070_Paper.pdf