Automatic Correction of Writing Anomalies in Hausa Texts

Ahmad Mustapha Wali, Sergiu Nisioi


Abstract
Hausa texts are often characterized by writing anomalies such as incorrect character substitutions and spacing errors, which sometimes hinder natural language processing (NLP) applications. This paper presents an approach to automatically correct the anomalies by finetuning transformer-based models. Using a corpus gathered from several public sources, we create a large-scale parallel dataset of over 400,000 noisy-clean Hausa sentence pairs by introducing synthetically generated noise to mimic realistic writing errors. Moreover, we finetune several multilingual and African language models, including M2M100, AfriTeVA, NCAIR1/N-ATLaS, UBC-NLP/cheetah-base, and other variants of BART and T5 for this correction task. Our experimental results demonstrate that models such as M2M100 achieve state-of-the-art results despite their smaller size and distinct pretraining, and that correcting errors can have a significant impact in improving downstream tasks such as text classification, machine translation, question answering, and LLM prompting in general. This research provides a methodology, a publicly available dataset, and a comparison of models to improve Hausa text quality, thereby advancing NLP capabilities for the language and offering transferable insights for other low-resource languages.
Anthology ID:
2026.acl-long.430
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9514–9528
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.430/
DOI:
Bibkey:
Cite (ACL):
Ahmad Mustapha Wali and Sergiu Nisioi. 2026. Automatic Correction of Writing Anomalies in Hausa Texts. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9514–9528, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Automatic Correction of Writing Anomalies in Hausa Texts (Wali & Nisioi, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.430.pdf
Checklist:
 2026.acl-long.430.checklist.pdf