Osman Turghun


2020

pdf
Constructing Uyghur Name Entity Recognition System using Neural Machine Translation Tag Projection
Anwar Azmat | Li Xiao | Yang Yating | Dong Rui | Osman Turghun
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Although named entity recognition achieved great success by introducing the neural networks, it is challenging to apply these models to low resource languages including Uyghur while it depends on a large amount of annotated training data. Constructing a well-annotated named entity corpus manually is very time-consuming and labor-intensive. Most existing methods based on the parallel corpus combined with the word alignment tools. However, word alignment methods introduce alignment errors inevitably. In this paper, we address this problem by a named entity tag transfer method based on the common neural machine translation. The proposed method marks the entity boundaries in Chinese sentence and translates the sentences to Uyghur by neural machine translation system, hope that neural machine translation will align the source and target entity by the self-attention mechanism. The experimental results show that the Uyghur named entity recognition system trained by the constructed corpus achieve good performance on the test set, with 73.80% F1 score(3.79% improvement by baseline)