Emmanuel Ngué Um

Also published as: Emmanuel Ngue Um


2025

pdf bib
Speech Technologies Datasets for African Under-Served Languages
Emmanuel Ngue Um | Francis Tyers | Eliette-Caroline Emilie Ngo Tjomb | Florus Landry Dibengue | Blaise-Mathieu Banoum Manguele | Blaise Abbo Djoulde | Mathilde Nyambe A | Brice Martial Atangana Eloundou | Jeff Sterling Ngami Kamagoua | José Mpouda Avom | Zacharie Nyobe | Emmanuel Giovanni Eloundou Eyenga | André Likwai
Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages

The expansion of the speech technology sector has given rise to a novel economic model in language research, with the objective of developing speech datasets. This model is expanding to under-served African languages through collaborative efforts between industries, organisations, and the active participation of communities. This collaboration is yielding new datasets for machine learning, while also disclosing vulnerabilities and sociolinguistic discrepancies between industrialised and non-industrialised societies. A case study of a speech data collection camp that took place in September 2024 in Cameroon, involving representatives of 31 languages throughout the continent, illustrates both the prospects of the new economic model for research on under-served languages and the challenges of fair, effective, and responsible participation.

2023

pdf bib
Comparing methods of orthographic conversion for Bàsàá, a language of Cameroon
Alexandra O’neil | Daniel Swanson | Robert Pugh | Francis Tyers | Emmanuel Ngue Um
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)

Orthographical standardization is a milestone in a language’s documentation and the development of its resources. However, texts written in former orthographies remain relevant to the language’s history and development and therefore must be converted to the standardized orthography. Ensuring a language has access to the orthographically standardized version of all of its recorded texts is important in the development of resources as it provides additional textual resources for training, supports contribution of authors using former writing systems, and provides information about the development of the language. This paper evaluates the performance of natural language processing methods, specifically Finite State Transducers and Long Short-term Memory networks, for the orthographical conversion of Bàsàá texts from the Protestant missionary orthography to the now-standard AGLC orthography, with the conclusion that LSTMs are somewhat more effective in the absence of explicit lexical information.

2017

pdf bib
Issues in digital text representation, on-line dissemination, sharing and re-use for African minority languages
Emmanuel Ngué Um
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages