Mongolian Named Entity Recognition System with Rich Features

Weihua Wang, Feilong Bao, Guanglai Gao


Abstract
In this paper, we first build a manually annotated named entity corpus of Mongolian. Then, we propose three morphological processing methods and study comprehensive features, including syllable features, lexical features, context features, morphological features and semantic features in Mongolian named entity recognition. Moreover, we also evaluate the influence of word cluster features on the system and combine all features together eventually. The experimental result shows that segmenting each suffix into an individual token achieves better results than deleting suffixes or using the suffixes as feature. The system based on segmenting suffixes with all proposed features yields benchmark result of F-measure=84.65 on this corpus.
Anthology ID:
C16-1049
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
505–512
Language:
URL:
https://aclanthology.org/C16-1049
DOI:
Bibkey:
Cite (ACL):
Weihua Wang, Feilong Bao, and Guanglai Gao. 2016. Mongolian Named Entity Recognition System with Rich Features. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 505–512, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Mongolian Named Entity Recognition System with Rich Features (Wang et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/C16-1049.pdf