Sally Akevai Nicholas


2024

pdf
Development of Community-Oriented Text-to-Speech Models for Māori ‘Avaiki Nui (Cook Islands Māori)
Jesin James | Rolando Coto-Solano | Sally Akevai Nicholas | Joshua Zhu | Bovey Yu | Fuki Babasaki | Jenny Tyler Wang | Nicholas Derby
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper we describe the development of a text-to-speech system for Māori ‘Avaiki Nui (Cook Islands Māori). We provide details about the process of community-collaboration that was followed throughout the project, a continued engagement where we are trying to develop speech and language technology for the benefit of the community. During this process we gathered a group of recordings that we used to train a TTS system. When training we used two approaches, the HMM-system MaryTTS (Schröder et al., 2011) and the deep learning system FastSpeech2 (Ren et al., 2020). We performed two evaluation tasks on the models: First, we measured their quality by having the synthesized speech transcribed by ASR. The human produced ground truth had lower error rates (CER=4.3, WER=18), but the FastSpeech2 audio has lower error rates (CER=11.8 and WER=42.7) than the MaryTTS voice (CER=17.9 and WER=48.1). The second evaluation was a survey amongst speakers of the language so they could judge the voice’s quality. The ground truth was rated with the highest quality (MOS=4.6), but the FastSpeech2 voice had an overall quality of MOS=3.2, which was significantly higher than that of the MaryTTS synthesized recordings (MOS=2.0). We intend to use the FastSpeech2 model to create language learning tools for community members both on the Cook Islands and in the diaspora.

2023

pdf
Towards Universal Dependencies in Cook Islands Māori
Sarah Karnes | Rolando Coto | Sally Akevai Nicholas
Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages

2022

pdf
Development of Automatic Speech Recognition for the Documentation of Cook Islands Māori
Rolando Coto-Solano | Sally Akevai Nicholas | Samiha Datta | Victoria Quint | Piripi Wills | Emma Ngakuravaru Powell | Liam Koka’ua | Syed Tanveer | Isaac Feldman
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes the process of data processing and training of an automatic speech recognition (ASR) system for Cook Islands Māori (CIM), an Indigenous language spoken by approximately 22,000 people in the South Pacific. We transcribed four hours of speech from adults and elderly speakers of the language and prepared two experiments. First, we trained three ASR systems: one statistical, Kaldi; and two based on Deep Learning, DeepSpeech and XLSR-Wav2Vec2. Wav2Vec2 tied with Kaldi for lowest character error rate (CER=6±1) and was slightly behind in word error rate (WER=23±2 versus WER=18±2 for Kaldi). This provides evidence that Deep Learning ASR systems are reaching the performance of statistical methods on small datasets, and that they can work effectively with extremely low-resource Indigenous languages like CIM. In the second experiment we used Wav2Vec2 to train models with held-out speakers. While the performance decreased (CER=15±7, WER=46±16), the system still showed considerable learning. We intend to use ASR to accelerate the documentation of CIM, using newly transcribed texts to improve the ASR and also generate teaching and language revitalization materials. The trained model is available under a license based on the Kaitiakitanga License, which provides for non-commercial use while retaining control of the model by the Indigenous community.

2018

pdf
Development of Natural Language Processing Tools for Cook Islands Māori
Rolando Coto Solano | Sally Akevai Nicholas | Samantha Wray
Proceedings of the Australasian Language Technology Association Workshop 2018

This paper presents three ongoing projects for NLP in Cook Islands Maori: Untrained Forced Alignment (approx. 9% error when detecting the center of words), speech-to-text (37% WER in the best trained models) and POS tagging (92% accuracy for the best performing model). Included as part of these projects are new resources filling in a gap in Australasian languages, including gold standard POS-tagged written corpora, transcribed speech corpora, time-aligned corpora down to the level of phonemes. These are part of efforts to accelerate the documentation of Cook Islands Maori and to increase its vitality amongst its users.