Elodie Gauthier


Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal
Elodie Gauthier | Aminata Ndiaye | Abdoulaye Guissé
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024

This work is part of the Kallaama project, whose objective is to produce and disseminate national languages corpora for speech technologies developments, in the field of agriculture. Except for Wolof, which benefits from some language data for natural language processing, national languages of Senegal are largely ignored by language technology providers. However, such technologies are keys to the protection, promotion and teaching of these languages. Kallaama focuses on the 3 main spoken languages by Senegalese people: Wolof, Pulaar and Sereer. These languages are widely spoken by the population, with around 10 million of native Senegalese speakers, not to mention those outside the country. However, they remain under-resourced in terms of machine-readable data that can be used for automatic processing and language technologies, all the more so in the agricultural sector. We release a transcribed speech dataset containing 125 hours of recordings, about agriculture, in each of the above-mentioned languages. These resources are specifically designed for Automatic Speech Recognition purpose, including traditional approaches. To build such technologies, we provide textual corpora in Wolof and Pulaar, and a pronunciation lexicon containing 49,132 entries from the Wolof dataset.


Preuve de concept d’un bot vocal dialoguant en wolof (Proof-of-Concept of a Voicebot Speaking Wolof)
Elodie Gauthier | Papa Séga Wade | Thierry Moudenc | Patrice Collen | Emilie De Neef | Oumar Ba | Ndeye Khoyane Cama | Ahmadou Bamba Kebe | Ndeye Aissatou Gningue | Thomas Mendo’O Aristide
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Cet article présente la preuve de concept du premier assistant vocal automatique en wolof, première langue véhiculaire parlée au Sénégal. Ce bot vocal est le résultat d’un projet de recherche collaboratif entre Orange Innovation en France, Orange Sénégal (alias Sonatel) et ADNCorp, une petite société informatique basée à Dakar, au Sénégal. Le but du bot vocal est de fournir des informations aux clients d’Orange sur le programme de fidélité Sargal d’Orange Sénégal en utilisant le moyen le plus naturel de communiquer : la parole. Le bot vocal reçoit la demande orale du client, qui est traitée par un moteur de compréhension de la parole, et répond avec des messages audio préenregistrés. Les premiers résultats de cette preuve de concept sont encourageants : nous avons obtenu un WER de 22 % pour la tâche de reconnaissance vocale et une F-mesure de 78 % pour la tâche de compréhension.


Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville)
Annie Rialland | Martine Adda-Decker | Guy-Noël Kouarata | Gilles Adda | Laurent Besacier | Lori Lamel | Elodie Gauthier | Pierre Godard | Jamison Cooper-Leavitt
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof
Elodie Gauthier | Laurent Besacier | Sylvie Voisin | Michael Melese | Uriel Pascal Elingui
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This article presents the data collected and ASR systems developped for 4 sub-saharan african languages (Swahili, Hausa, Amharic and Wolof). To illustrate our methodology, the focus is made on Wolof (a very under-resourced language) for which we designed the first ASR system ever built in this language. All data and scripts are available online on our github repository.