Abiderexiti Kahaerjiang


2021

pdf
Morphological Analysis Corpus Construction of Uyghur
Abudouwaili Gulinigeer | Abiderexiti Kahaerjiang | Wushouer Jiamila | Shen Yunfei | Maimaitimin Turenisha | Yibulayin Tuergen
Proceedings of the 20th Chinese National Conference on Computational Linguistics

Morphological analysis is a fundamental task in natural language processing and results can beapplied to different downstream tasks such as named entity recognition syntactic analysis andmachine translation. However there are many problems in morphological analysis such as lowaccuracy caused by a lack of resources. In this paper to alleviate the lack of resources in Uyghurmorphological analysis research we construct a Uyghur morphological analysis corpus based onthe analysis of grammatical features and the format of the general morphological analysis corpus.We define morphological tags from 14 dimensions and 53 features manually annotate and correctthe dataset. Finally the corpus provided some informations such as word lemma part of speech morphological analysis tags morphological segmentation and lemmatization. Also this paperanalyzes some basic features of the corpus and we use the models and datasets provided bySIGMORPHON Shared Task organizers to design comparative experiments to verify the corpus’savailability. Results of the experiment are 85.56% 88.29% respectively. The corpus provides areference value for morphological analysis and promotes the research of Uyghur natural language processing.