Abstract
We present an algorithm for automatic correction of spelling errors on the sentence level, which uses noisy channel model and feature-based reranking of hypotheses. Our system is designed for Russian and clearly outperforms the winner of SpellRuEval-2016 competition. We show that language model size has the greatest influence on spelling correction quality. We also experiment with different types of features and show that morphological and semantic information also improves the accuracy of spellchecking.- Anthology ID:
- W17-1408
- Volume:
- Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Tomaž Erjavec, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
- Venue:
- BSNLP
- SIG:
- SIGSLAV
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 45–53
- Language:
- URL:
- https://aclanthology.org/W17-1408
- DOI:
- 10.18653/v1/W17-1408
- Cite (ACL):
- Alexey Sorokin. 2017. Spelling Correction for Morphologically Rich Language: a Case Study of Russian. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pages 45–53, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Spelling Correction for Morphologically Rich Language: a Case Study of Russian (Sorokin, BSNLP 2017)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/W17-1408.pdf