Regina Nasyrova


2025

pdf bib
Grammatical Error Correction via Sequence Tagging for Russian
Regina Nasyrova | Alexey Sorokin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

We introduce a modified sequence tagging architecture, proposed in (Omelianchuk et al., 2020), for the Grammatical Error Correction of the Russian language. We propose language-specific operation set and preprocessing algorithm as well as a classification scheme which makes distinct predictions for insertions and other operations. The best versions of our models outperform previous approaches and set new SOTA on the two Russian GEC benchmarks – RU-Lang8 and GERA, while achieve competitive performance on RULEC-GEC.

pdf bib
LLMs in alliance with Edit-based models: advancing In-Context Learning for Grammatical Error Correction by Specific Example Selection
Alexey Sorokin | Regina Nasyrova
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

We release LORuGEC – the first rule-annotated corpus for Russian Grammatical Error Correction. The corpus is designed for diagnostic purposes and contains 348 validation and 612 test sentences specially selected to represent complex rules of Russian writing. This makes our corpus significantly different from other Russian GEC corpora. We apply several large language models and approaches to our corpus, the best F0.5 score of 83% is achieved by 5-shot learning using YandexGPT-5 Pro model.To move further the boundaries of few-shot learning, we are the first to apply a GECTOR-like encoder model for similar examples retrieval. GECTOR-based example selection significantly boosts few-shot performance. This result is true not only for LORuGEC but for other Russian GEC corpora as well. On LORuGEC, the GECTOR-based retriever might be further improved using contrastive tuning on the task of rule label prediction. All these results hold for a broad class of large language models.