Active Learning with Non-Uniform Costs for African Natural Language Processing
Bonaventure F. P. Dossou, Ines Arous, Audrey Durand, Jackie Chi Kit Cheung
Abstract
Labeling datasets for African languages poses substantial challenges due to the diverse settings in which annotations are collected, leading to highly variable labeling costs. These costs vary with task complexity, annotator expertise, and data availability. Yet, most active learning (AL) frameworks assume uniform annotation costs, limiting their applicability in real-world, resource-constrained scenarios. To address this, we introduce KnapsackBALD, a novel cost-aware active learning method that integrates the BatchBALD acquisition strategy with a 0-1 Knapsack optimization objective to select informative and budget-efficient samples. We evaluate KnapsackBALD on the MasakhaNEWS dataset, a multilingual news classification benchmark covering 11 African languages. Our method consistently outperforms seven strong active learning baselines, including BALD, BatchBALD, and stochastic sampling variants such as PowerBALD and Softmax-BALD, across all three cost scenarios. The performance gap widens as annotation cost imbalances become more extreme, demonstrating the robustness of KnapsackBALD in different cost settings. These findings show that when annotation costs are explicitly heterogeneous, cost-sensitive acquisition is critical for effective active learning, as demonstrated in African Languages NLP and similar settings. Our code base is open-sourced here.- Anthology ID:
- 2026.findings-eacl.349
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2026
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6644–6656
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.349/
- DOI:
- Cite (ACL):
- Bonaventure F. P. Dossou, Ines Arous, Audrey Durand, and Jackie Chi Kit Cheung. 2026. Active Learning with Non-Uniform Costs for African Natural Language Processing. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6644–6656, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Active Learning with Non-Uniform Costs for African Natural Language Processing (Dossou et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.349.pdf