Atli Sigurgeirsson


Talrómur: A large Icelandic TTS corpus
Atli Sigurgeirsson | Þorsteinn Gunnarsson | Gunnar Örnólfsson | Eydís Magnúsdóttir | Ragnheiður Þórhallsdóttir | Stefán Jónsson | Jón Guðnason
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

We present Talrómur, a large high-quality Text-To-Speech (TTS) corpus for the Icelandic language. This multi-speaker corpus contains recordings from 4 male speakers and 4 female speakers of a wide range in age and speaking style. The corpus consists of 122,417 single utterance recordings equating to approximately 213 hours of voice data. All speakers read from the same script which has a high coverage of possible Icelandic diphones. Manual analysis of 15,956 utterances indicates that the corpus has a reading mistake rate no higher than 0.25%. We additionally present results from subjective evaluations of the different voices with regards to intelligibility, likeability and trustworthiness.


Manual Speech Synthesis Data Acquisition - From Script Design to Recording Speech
Atli Sigurgeirsson | Gunnar Örnólfsson | Jón Guðnason
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Atli Þór Sigurgeirsson,, Reykjavik University Gunnar Thor Örnólfsson,, Árni Magnússon institute of Icelandic studies Dr. Jón Guðnason, In this paper we present the work of collecting a large amount of high quality speech synthesis data for Icelandic. 8 speakers will be recorded for 20 hours each. A script design strategy is proposed and three scripts have been generated to maximize diphone coverage, varying in length. The largest reading script contains 14,400 prompts and includes 87.3% of all Icelandic diphones at least once and 81% of all Icelandic diphones at least twenty times. A recording client was developed to facilitate recording sessions. The client supports easily importing scripts and maintaining multiple collections in parallel. The recorded data can be downloaded straight from the client. Recording sessions are carried out in a professional studio under supervision and started October of 2019. As of writing, 58.7 hours of high quality speech data has been collected. The scripts, the recording software and the speech data will later be released under a CC-BY 4.0 license.