Khuyagbaatar Batsuren


2021

pdf bib
MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology
Khuyagbaatar Batsuren | Gábor Bella | Fausto Giunchiglia
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

Large-scale morphological databases provide essential input to a wide range of NLP applications. Inflectional data is of particular importance for morphologically rich (agglutinative and highly inflecting) languages, and derivations can be used, e.g. to infer the semantics of out-of-vocabulary words. Extending the scope of state-of-the-art multilingual morphological databases, we announce the release of MorphyNet, a high-quality resource with 15 languages, 519k derivational and 10.1M inflectional entries, and a rich set of morphological features. MorphyNet was extracted from Wiktionary using both hand-crafted and automated methods, and was manually evaluated to be of a precision higher than 98%. Both the resource generation logic and the resulting database are made freely available and are reusable as stand-alone tools or in combination with existing resources.

2019

pdf bib
Building the Mongolian WordNet
Khuyagbaatar Batsuren | Amarsanaa Ganbold | Altangerel Chagnaa | Fausto Giunchiglia
Proceedings of the 10th Global Wordnet Conference

This paper presents the Mongolian Wordnet (MOW), and a general methodology of how to construct it from various sources e.g. lexical resources and expert translations. As of today, the MOW contains 23,665 synsets, 26,875 words, 2,979 glosses, and 213 examples. The manual evaluation of the resource1 estimated its quality at 96.4%.

pdf bib
CogNet: A Large-Scale Cognate Database
Khuyagbaatar Batsuren | Gabor Bella | Fausto Giunchiglia
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper introduces CogNet, a new, large-scale lexical database that provides cognates -words of common origin and meaning- across languages. The database currently contains 3.1 million cognate pairs across 338 languages using 35 writing systems. The paper also describes the automated method by which cognates were computed from publicly available wordnets, with an accuracy evaluated to 94%. Finally, it presents statistics about the cognate data and some initial insights into it, hinting at a possible future exploitation of the resource by various fields of lingustics.

pdf bib
Aligning the IndoWordNet with the Princeton WordNet
Nandu Chandran Nair | Rajendran Sankara Velayuthan | Khuyagbaatar Batsuren
Proceedings of the 3rd International Conference on Natural Language and Speech Processing