Kholisa Podile


2008

pdf
Experimental Fast-Tracking of Morphological Analysers for Nguni Languages
Sonja Bosch | Laurette Pretorius | Kholisa Podile | Axel Fleisch
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The development of natural language processing (NLP) components is resource-intensive and therefore justifies exploring ways of reducing development time and effort when building NLP components. This paper addresses the experimental fast-tracking of the development of finite-state morphological analysers for Xhosa, Swati and (Southern) Ndebele by using an existing morphological analyser prototype for Zulu. The research question is whether fast-tracking is feasible across the language boundaries between these closely related varieties. The objective is a thorough assessment of recognition rates yielded by the Zulu morphological analyser for the three related languages. The strategy is to use techniques comprising several cycles of the following steps: applying the analyser to corpus data from all languages, identifying failures, and implementing the respective changes in the analyser. Tests show that the high degree of shared typological properties and formal similarities among the Nguni varieties warrants a modular fast-tracking approach. Word forms recognized by the Zulu analyser were mostly adequately interpreted. Therefore, the focus lies on providing adaptations based on failure output analysis for each language. As a result, the development of analysers for Xhosa, Swati and Ndebele is considerably faster than the creation of the Zulu prototype. The paper concludes with comments on the feasibility of the experiment, and the results of the evaluation.