Ragnheiður Þórhallsdóttir
2022
Samrómur: Crowd-sourcing large amounts of data
Staffan Hedström
|
David Erik Mollberg
|
Ragnheiður Þórhallsdóttir
|
Jón Guðnason
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This contribution describes the collection of a large and diverse corpus for speech recognition and similar tools using crowd-sourced donations. We have built a collection platform inspired by Mozilla Common Voice and specialized it to our needs. We discuss the importance of engaging the community and motivating it to contribute, in our case through competitions. Given the incentive and a platform to easily read in large amounts of utterances, we have observed four cases of speakers freely donating over 10 thousand utterances. We have also seen that women are keener to participate in these events throughout all age groups. Manually verifying a large corpus is a monumental task and we attempt to automatically verify parts of the data using tools like Marosijo and the Montreal Forced Aligner. The method proved helpful, especially for detecting invalid utterances and halving the work needed from crowd-sourced verification.
2021
Talrómur: A large Icelandic TTS corpus
Atli Sigurgeirsson
|
Þorsteinn Gunnarsson
|
Gunnar Örnólfsson
|
Eydís Magnúsdóttir
|
Ragnheiður Þórhallsdóttir
|
Stefán Jónsson
|
Jón Guðnason
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
We present Talrómur, a large high-quality Text-To-Speech (TTS) corpus for the Icelandic language. This multi-speaker corpus contains recordings from 4 male speakers and 4 female speakers of a wide range in age and speaking style. The corpus consists of 122,417 single utterance recordings equating to approximately 213 hours of voice data. All speakers read from the same script which has a high coverage of possible Icelandic diphones. Manual analysis of 15,956 utterances indicates that the corpus has a reading mistake rate no higher than 0.25%. We additionally present results from subjective evaluations of the different voices with regards to intelligibility, likeability and trustworthiness.
Search