Michael Leventhal


2025

pdf bib
GAIfE: Using GenAI to Improve Literacy in Low-resourced Settings
Allahsera Auguste Tapo | Nouhoum Coulibaly | Seydou Diallo | Sebastien Diarra | Christopher M Homan | Mamadou K. Keita | Michael Leventhal
Findings of the Association for Computational Linguistics: NAACL 2025

Illiteracy is a predictor of many negative social and personal outcomes. Illiteracy rates are particularly high in countries with underresourced languages, where few books exist that are suitable for children to learn to read from. We present GAIfE (Generative AI for Education), a toolchain and workflow developed through empirical methods, that demonstrates how existing tools can be adapted to address low literacy for an underresourced language. We used GAIfE (a play on the Bambara word for “book”) to construct materials for developing children’s reading competence in Bambara, the vehicular language of Mali. Our approach to the generation and post-generation editing of content skewed by the Global-North-centric bias of available LLMs, enabled us to rapidly multiply the content in Bambara available online by 10 times while maintaining high standards of attractiveness of the material to maintain high engagement, accurate representation of the Malian culture and physical and social environment and language quality. Using our materials, pilot reading programs achieved a 67% reduction in the number of children unable to read Bambara. Our approach demonstrated the power of bias-aware application of generative AI to the problem domain as well as the potential impact the application of this technology could have on reducing illiteracy and improving learning outcomes through native language education.

2020

pdf bib
Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara
Allahsera Auguste Tapo | Bakary Coulibaly | Sébastien Diarra | Christopher Homan | Julia Kreutzer | Sarah Luger | Arthur Nagashima | Marcos Zampieri | Michael Leventhal
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

Low-resource languages present unique challenges to (neural) machine translation. We discuss the case of Bambara, a Mande language for which training data is scarce and requires significant amounts of pre-processing. More than the linguistic situation of Bambara itself, the socio-cultural context within which Bambara speakers live poses challenges for automated processing of this language. In this paper, we present the first parallel data set for machine translation of Bambara into and from English and French and the first benchmark results on machine translation to and from Bambara. We discuss challenges in working with low-resource languages and propose strategies to cope with data scarcity in low-resource machine translation (MT).