AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus
Soumedhik Bharati, Shibam Mandal, Prithwish Ghosh, Swarup Kr Ghosh, Sayani Mondal
Abstract
Hausa Ajami (Hausa written in Arabic script) remains severely under-resourced for computational morphology. We present AjamiMorph, a zero-annotation framework that discovers morphemes through consensus among three unsupervised methods, namely, Byte Pair Encoding (BPE), transition-based boundary detection using Pointwise Mutual Information (PMI), and computational linguistics based Distributional Affix Mining (DAM). Using a Hausa Ajami Bible corpus consisting of 637,414 tokens, AjamiMorph identifies 1,611 high-confidence morphemes, achieving 99.9% coverage. The inventory exhibits a linguistically realistic distribution (66.0% stems, 22.6% suffixes, 11.4% prefixes) and recovers 77.8% of known Hausa affixes. A permutation test that shuffles method assignments (preserving per-method selection sizes) confirms that the observed agreement is above-chance; chi-square remains as a secondary check. A lightweight 5-gram LM comparison (characters vs. consensus morphemes) provides an extrinsic signal. We also report negative results for script-driven Arabic assumptions and LLM-first annotation. This work provides the first unsupervised morpheme inventory for Hausa Ajami and demonstrates consensus as a robust strategy for zero-resource morphology.- Anthology ID:
- 2026.abjadnlp-1.23
- Volume:
- Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Venues:
- AbjadNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 166–171
- Language:
- URL:
- https://preview.aclanthology.org/manual-author-scripts/2026.abjadnlp-1.23/
- DOI:
- Cite (ACL):
- Soumedhik Bharati, Shibam Mandal, Prithwish Ghosh, Swarup Kr Ghosh, and Sayani Mondal. 2026. AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 166–171, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus (Bharati et al., AbjadNLP 2026)
- PDF:
- https://preview.aclanthology.org/manual-author-scripts/2026.abjadnlp-1.23.pdf