Fabiola Henri
2026
Child Support: Leveraging Lexifiers Resources to Support Creoles ASR
Éric Le Ferrand | Fabiola Henri
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)
Éric Le Ferrand | Fabiola Henri
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)
Creole languages emerged from colonial contact and the slave trade. Although they inheritthe bulk of their vocabulary from a "lexifier"language, they remain classic low-resourcelanguages, presenting significant challengesfor speech technology. This paper exploreshow the abundant resources of a lexifier canbe leveraged for Creole-specific tools, focusing on Automatic Speech Recognition (ASR).Specifically, we use an artificial dataset generated a French-trained Text-to-Speech (TTS)model and French datasets to pre-finetune ASRmodels for two French-based Creoles. Ourresults demonstrate that a two-stage trainingsetup where models are first trained on artificial datasets leads to substantial performanceboost for transcribing Creole languages. Additionally, this approach serves as a viable firststep for ASR development in zero-resource scenarios.
2023
Application of Speech Processes for the Documentation of Kréyòl Gwadloupéyen
Éric Le Ferrand | Fabiola Henri | Benjamin Lecouteux | Emmanuel Schang
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
Éric Le Ferrand | Fabiola Henri | Benjamin Lecouteux | Emmanuel Schang
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
In recent times, there has been a growing number of research studies focused on addressing the challenges posed by low-resource languages and the transcription bottleneck phenomenon. This phenomenon has driven the development of speech recognition methods to transcribe regional and Indigenous languages automatically. Although there is much talk about bridging the gap between speech technologies and field linguistics, there is a lack of documented efficient communication between NLP experts and documentary linguists. The models created for low-resource languages often remain within the confines of computer science departments, while documentary linguistics remain attached to traditional transcription workflows. This paper presents the early stage of a collaboration between NLP experts and field linguists, resulting in the successful transcription of Kréyòl Gwadloupéyen using speech recognition technology.