What Aggregate Scores Hide: Per-Rule Evaluation of Russian Grammatical Error Correction
Anna Smirnova, Artyom Kopan, Vladislav Makeev, George Chernishev
Abstract
Russian grammar correction models can improveon aggregate benchmarkswhile getting worse at specific grammar rules.We show this through per-rule evaluationon a diagnostic benchmark of 48 prescriptive rules:finetuning on synthetic data improves overall F0.5while driving subordinate-clause comma accuracyfrom 14% to 1%.The suppression is invisible under corpus-level metricsand undetectable with existing coarse, corpus-specific tagsets;it is recoverable only when diagnosed at rule granularity.To enable this analysis,we develop a 98-category error taxonomygrounded in Rozental’s reference grammarand SyntErr, an open-source synthetic data generatorwhose per-rule distribution is an explicit parameter,designed to support arbitrary rule sets and languages.Finetuning eight open models (0.8B–12B)on 39K synthetic examplesyields up to 75.3 F0.5,approaching frontier API modelswith models small enough to run on device.We release the taxonomy, generator,per-rule evaluation data, and all training artifacts.- Anthology ID:
- 2026.bea-1.32
- Volume:
- Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
- Venues:
- BEA | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 463–478
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.32/
- DOI:
- Cite (ACL):
- Anna Smirnova, Artyom Kopan, Vladislav Makeev, and George Chernishev. 2026. What Aggregate Scores Hide: Per-Rule Evaluation of Russian Grammatical Error Correction. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 463–478, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- What Aggregate Scores Hide: Per-Rule Evaluation of Russian Grammatical Error Correction (Smirnova et al., BEA 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.32.pdf