Gulmira Tolegen

2022

pdf abs
Language-Independent Approach for Morphological Disambiguation
Alymzhan Toleu | Gulmira Tolegen | Rustam Mussabayev
Proceedings of the 29th International Conference on Computational Linguistics

This paper presents a language-independent approach for morphological disambiguation which has been regarded as an extension of POS tagging, jointly predicting complex morphological tags. In the proposed approach, all words, roots, POS and morpheme tags are embedded into vectors, and contexts representations from surface word and morphological contexts are calculated. Then the inner products between analyses and the context’s representations are computed to perform the disambiguation. The underlying hypothesis is that the correct morphological analysis should be closer to the context in a vector space. Experimental results show that the proposed approach outperforms the existing models on seven different language datasets. Concretely, compared with the baselines of MarMot and a sophisticated neural model (Seq2Seq), the proposed approach achieves around 6% improvement in average accuracy for all languages while running about 6 and 33 times faster than MarMot and Seq2Seq, respectively.

2020

pdf abs
Voted-Perceptron Approach for Kazakh Morphological Disambiguation
Gulmira Tolegen | Alymzhan Toleu | Rustam Mussabayev
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

This paper presents an approach of voted perceptron for morphological disambiguation for the case of Kazakh language. Guided by the intuition that the feature value from the correct path of analyses must be higher than the feature value of non-correct path of analyses, we propose the voted perceptron algorithm with Viterbi decoding manner for disambiguation. The approach can use arbitrary features to learn the feature vector for a sequence of analyses, which plays a vital role for disambiguation. Experimental results show that our approach outperforms other statistical and rule-based models. Moreover, we manually annotated a new morphological disambiguation corpus for Kazakh language.

2017

pdf abs
Character-Aware Neural Morphological Disambiguation
Alymzhan Toleu | Gulmira Tolegen | Aibek Makazhanov
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We develop a language-independent, deep learning-based approach to the task of morphological disambiguation. Guided by the intuition that the correct analysis should be “most similar” to the context, we propose dense representations for morphological analyses and surface context and a simple yet effective way of combining the two to perform disambiguation. Our approach improves on the language-dependent state of the art for two agglutinative languages (Turkish and Kazakh) and can be potentially applied to other morphologically complex languages.

Gulmira Tolegen

2022

2020

2017

Co-authors

Venues