Swahili News Classification: Performance, Challenges, and Explainability Across ML, DL, and Transformers

Manas Pandya, Avinash Kumar Sharma, Arpit Shukla


Abstract
In this paper, we propose a comprehensive framework for the classification of Swahili news articles using a combination of classical machine learning techniques, deep neural networks, and transformer-based models. By balancing two diverse datasets sourced from Harvard Dataverse and Kaggle, our approach addresses the inherent challenges of imbalanced data in low-resource languages. Our experiments demonstrate the effectiveness of the proposed methodology and set the stage for further advances in Swahili natural language processing.
Anthology ID:
2025.africanlp-1.30
Volume:
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Constantine Lignos, Idris Abdulmumin, David Adelani
Venues:
AfricaNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
203–209
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.africanlp-1.30/
DOI:
Bibkey:
Cite (ACL):
Manas Pandya, Avinash Kumar Sharma, and Arpit Shukla. 2025. Swahili News Classification: Performance, Challenges, and Explainability Across ML, DL, and Transformers. In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), pages 203–209, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Swahili News Classification: Performance, Challenges, and Explainability Across ML, DL, and Transformers (Pandya et al., AfricaNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.africanlp-1.30.pdf