Mikhail Kopotev
2026
Automated CEFR-Level Assignment for Ukrainian Texts
Olha Kanishcheva | Mikhail Kopotev
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Olha Kanishcheva | Mikhail Kopotev
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
The present study evaluates CEFR-based text complexity for Ukrainian using a new dataset compiled from textbooks, designed for language learners. We compare traditional machine learning, transformer-based models, and LLM-based evaluation across A1–B2 language proficiency levels. Results show that explicit linguistic features remain highly effective: a Random Forest classifier achieves the highest macro-F1 (0.576), slightly outperforming fine-tuned XLM-RoBERTa (0.574). While GPT-5.5 shows strong performance (macro-F1 0.564), marking a significant advancement over GPT-4.1, supervised models achieve slightly better scores in this experiment for the proficiency-level assessment. These findings suggest that structured linguistic analysis is a robust alternative to purely neural approaches for Ukrainian CEFR classification.
2019
Modeling language learning using specialized Elo rating
Jue Hou | Maximilian W. Koppatz | José María Hoya Quecedo | Nataliya Stoyanova | Mikhail Kopotev | Roman Yangarber
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Jue Hou | Maximilian W. Koppatz | José María Hoya Quecedo | Nataliya Stoyanova | Mikhail Kopotev | Roman Yangarber
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Automatic assessment of the proficiency levels of the learner is a critical part of Intelligent Tutoring Systems. We present methods for assessment in the context of language learning. We use a specialized Elo formula used in conjunction with educational data mining. We simultaneously obtain ratings for the proficiency of the learners and for the difficulty of the linguistic concepts that the learners are trying to master. From the same data we also learn a graph structure representing a domain model capturing the relations among the concepts. This application of Elo provides ratings for learners and concepts which correlate well with subjective proficiency levels of the learners and difficulty levels of the concepts.
2015
Online Extraction of Russian Multiword Expressions
Mikhail Kopotev | Llorenç Escoter | Daria Kormacheva | Matthew Pierce | Lidia Pivovarova | Roman Yangarber
The 5th Workshop on Balto-Slavic Natural Language Processing
Mikhail Kopotev | Llorenç Escoter | Daria Kormacheva | Matthew Pierce | Lidia Pivovarova | Roman Yangarber
The 5th Workshop on Balto-Slavic Natural Language Processing
2013
Automatic Detection of Stable Grammatical Features in N-Grams
Mikhail Kopotev | Lidia Pivovarova | Natalia Kochetkova | Roman Yangarber
Proceedings of the 9th Workshop on Multiword Expressions
Mikhail Kopotev | Lidia Pivovarova | Natalia Kochetkova | Roman Yangarber
Proceedings of the 9th Workshop on Multiword Expressions
2008
Designing and Evaluating a Russian Tagset
Serge Sharoff | Mikhail Kopotev | Tomaž Erjavec | Anna Feldman | Dagmar Divjak
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Serge Sharoff | Mikhail Kopotev | Tomaž Erjavec | Anna Feldman | Dagmar Divjak
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset is based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 500 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set that can be shared with other researchers.