Swahili News Classification: Performance, Challenges, and Explainability Across ML, DL, and Transformers

Manas Pandya; Avinash Kumar Sharma; Arpit Shukla

Swahili News Classification: Performance, Challenges, and Explainability Across ML, DL, and Transformers

Manas Pandya, Avinash Kumar Sharma, Arpit Shukla

Abstract

In this paper, we propose a comprehensive framework for the classification of Swahili news articles using a combination of classical machine learning techniques, deep neural networks, and transformer-based models. By balancing two diverse datasets sourced from Harvard Dataverse and Kaggle, our approach addresses the inherent challenges of imbalanced data in low-resource languages. Our experiments demonstrate the effectiveness of the proposed methodology and set the stage for further advances in Swahili natural language processing.

Anthology ID:: 2025.africanlp-1.30
Volume:: Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Constantine Lignos, Idris Abdulmumin, David Adelani
Venues:: AfricaNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 203–209
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.africanlp-1.30/
DOI:
Bibkey:
Cite (ACL):: Manas Pandya, Avinash Kumar Sharma, and Arpit Shukla. 2025. Swahili News Classification: Performance, Challenges, and Explainability Across ML, DL, and Transformers. In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), pages 203–209, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Swahili News Classification: Performance, Challenges, and Explainability Across ML, DL, and Transformers (Pandya et al., AfricaNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.africanlp-1.30.pdf

PDF Cite Search Fix data