What Aggregate Scores Hide: Per-Rule Evaluation of Russian Grammatical Error Correction

Anna Smirnova; Artyom Kopan; Vladislav Makeev; George Chernishev

What Aggregate Scores Hide: Per-Rule Evaluation of Russian Grammatical Error Correction

Anna Smirnova, Artyom Kopan, Vladislav Makeev, George Chernishev

Abstract

Russian grammar correction models can improveon aggregate benchmarkswhile getting worse at specific grammar rules.We show this through per-rule evaluationon a diagnostic benchmark of 48 prescriptive rules:finetuning on synthetic data improves overall F_0.5while driving subordinate-clause comma accuracyfrom 14% to 1%.The suppression is invisible under corpus-level metricsand undetectable with existing coarse, corpus-specific tagsets;it is recoverable only when diagnosed at rule granularity.To enable this analysis,we develop a 98-category error taxonomygrounded in Rozental’s reference grammarand SyntErr, an open-source synthetic data generatorwhose per-rule distribution is an explicit parameter,designed to support arbitrary rule sets and languages.Finetuning eight open models (0.8B–12B)on 39K synthetic examplesyields up to 75.3 F_0.5,approaching frontier API modelswith models small enough to run on device.We release the taxonomy, generator,per-rule evaluation data, and all training artifacts.

Anthology ID:: 2026.bea-1.32
Volume:: Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 463–478
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.32/
DOI:
Bibkey:
Cite (ACL):: Anna Smirnova, Artyom Kopan, Vladislav Makeev, and George Chernishev. 2026. What Aggregate Scores Hide: Per-Rule Evaluation of Russian Grammatical Error Correction. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 463–478, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: What Aggregate Scores Hide: Per-Rule Evaluation of Russian Grammatical Error Correction (Smirnova et al., BEA 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.32.pdf

PDF Cite Search Fix data