Spelling Correction for Morphologically Rich Language: a Case Study of Russian

Alexey Sorokin


Abstract
We present an algorithm for automatic correction of spelling errors on the sentence level, which uses noisy channel model and feature-based reranking of hypotheses. Our system is designed for Russian and clearly outperforms the winner of SpellRuEval-2016 competition. We show that language model size has the greatest influence on spelling correction quality. We also experiment with different types of features and show that morphological and semantic information also improves the accuracy of spellchecking.
Anthology ID:
W17-1408
Volume:
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Tomaž Erjavec, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:
BSNLP
SIG:
SIGSLAV
Publisher:
Association for Computational Linguistics
Note:
Pages:
45–53
Language:
URL:
https://aclanthology.org/W17-1408
DOI:
10.18653/v1/W17-1408
Bibkey:
Cite (ACL):
Alexey Sorokin. 2017. Spelling Correction for Morphologically Rich Language: a Case Study of Russian. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pages 45–53, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Spelling Correction for Morphologically Rich Language: a Case Study of Russian (Sorokin, BSNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/W17-1408.pdf