Stephen Russell


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2022

pdf bib
BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus
Stephen Russell | Dewi Jones | Delyth Prys
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022

This paper presents the design, collection and verification of a bilingual text-to-speech synthesis corpus for Welsh and English. The ever expanding voice collection currently contains almost 10 hours of recordings from a bilingual, phonetically balanced text corpus. The speakers consist of a professional voice actor and three amateur contributors, with male and female accents from north and south Wales. This corpus provides audio-text pairs for building and training high-quality bilingual Welsh-English neural based TTS systems. We describe the process by which we created a phonetically balanced prompt set and the challenges of attempting to collate such a dataset during the COVID-19 pandemic. Our initial findings in validating the corpus via the implementation of a state-of-the-art TTS models are presented. This corpus represents the first open-source Welsh language corpus large enough to capitalise on neural TTS architectures.