Yann Le Beux
Also published as: Yann Le Beux
2026
AfriStereo: A Culturally Grounded Dataset for Evaluating Stereotypical Bias in Large Language Models
Yann Le Beux | Oluchi Audu | Oche David Ankeli | Dhananjay Balakrishnan | Melissah Weya | Marie Daniella Ralaiarinosy | Ignatius Ezeani
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Yann Le Beux | Oluchi Audu | Oche David Ankeli | Dhananjay Balakrishnan | Melissah Weya | Marie Daniella Ralaiarinosy | Ignatius Ezeani
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Existing AI bias evaluation benchmarks largely reflect Western perspectives, leaving African contexts underrepresented and enabling harmful stereotypes in applications across various domains. To address this gap, we introduce AfriStereo, the first open-source African stereotype dataset and evaluation framework grounded in local socio-cultural contexts. Through community engaged efforts across Senegal, Kenya, and Nigeria, we collect 1,163 stereotypes spanning gender, ethnicity, religion, age, and profession. Using few-shot prompting with human-in-the-loop validation, we augment the dataset to over 5,000 stereotype–antistereotype pairs. Entries are validated through semantic clustering and manual annotation by culturally informed reviewers. Preliminary evaluation of language models reveals that nine of eleven models exhibit statistically significant bias in our setup, with Bias Preference Ratios (BPR) ranging from 0.63 to 0.78 (p ≤ 0.05), indicating systematic preferences for stereotypes over antistereotypes, particularly across age, profession, and gender dimensions. Domain-specific models appear to show weaker bias in our setup, suggesting task-specific training may mitigate some associations. Looking ahead, AfriStereo opens pathways for future research on culturally grounded bias evaluation and mitigation, offering key methodologies for the AI community on building more equitable, context-aware, and globally inclusive NLP technologies.
Enhancing Automatic Speech Recognition Models for Maternal and Reproductive Health: Fine-Tuning and Real-World Evaluation in Wolof
Ertony Basilwango | Yann Le Beux | Oche David Ankeli | Pierre Herve Berdys
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Ertony Basilwango | Yann Le Beux | Oche David Ankeli | Pierre Herve Berdys
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Automatic Speech Recognition (ASR) systems perform well for high-resource languages, but most African languages, including Wolof, remain underrepresented, particularly in maternal and reproductive healthcare. This work proposes a domain-specific approach to improving Wolof ASR under low-resource conditions, addressing limited annotated data, orthographic variability, and code-switching. We curated a dataset of 750 validated Wolof utterances covering 250 maternal health keywords and applied data augmentation to increase acoustic diversity. Pretrained models, including wav2vec 2.0 and Whisper, were benchmarked to select candidates for fine-tuning. Using parameter-efficient Low-Rank Adaptation (LoRA), a Whisper model was adapted to the maternal health domain. Evaluation using Word Error Rate (WER), Character Error Rate (CER), and Keyword Error Rate (KER), which measures medically critical term transcription accuracy, shows substantial gains, reducing WER from 46.5% to 23.2% and KER from 17% to 11%. Community-based evaluation on 1,340 real-world utterances reveals a moderate degradation, with WER increasing by 35%. These results demonstrate that lightweight domain adaptation with small, high-quality data can significantly improve ASR for low-resource healthcare applications.This work introduces one of the first Wolof ASR datasets for healthcare and presents a practical framework for developing reliable speech recognition tools in underrepresented languages, improving access to healthcare information and services.