Abstract
Properly written texts in Igbo, a low-resource African language, are rich in both orthographic and tonal diacritics. Diacritics are essential in capturing the distinctions in pronunciation and meaning of words, as well as in lexical disambiguation. Unfortunately, most electronic texts in diacritic languages are written without diacritics. This makes diacritic restoration a necessary step in corpus building and language processing tasks for languages with diacritics. In our previous work, we built some n-gram models with simple smoothing techniques based on a closed-world assumption. However, as a classification task, diacritic restoration is well suited for and will be more generalisable with machine learning. This paper, therefore, presents a more standard approach to dealing with the task which involves the application of machine learning algorithms.- Anthology ID:
- W17-1907
- Volume:
- Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Venue:
- SENSE
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 53–60
- Language:
- URL:
- https://aclanthology.org/W17-1907
- DOI:
- 10.18653/v1/W17-1907
- Cite (ACL):
- Ignatius Ezeani, Mark Hepple, and Ikechukwu Onyenwe. 2017. Lexical Disambiguation of Igbo using Diacritic Restoration. In Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, pages 53–60, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Lexical Disambiguation of Igbo using Diacritic Restoration (Ezeani et al., SENSE 2017)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W17-1907.pdf