Arabisc: Context-Sensitive Neural Spelling Checker

Yasmin Moslem, Rejwanul Haque, Andy Way


Abstract
Traditional statistical approaches to spelling correction usually consist of two consecutive processes — error detection and correction — and they are generally computationally intensive. Current state-of-the-art neural spelling correction models usually attempt to correct spelling errors directly over an entire sentence, which, as a consequence, lacks control of the process, e.g. they are prone to overcorrection. In recent years, recurrent neural networks (RNNs), in particular long short-term memory (LSTM) hidden units, have proven increasingly popular and powerful models for many natural language processing (NLP) problems. Accordingly, we made use of a bidirectional LSTM language model (LM) for our context-sensitive spelling detection and correction model which is shown to have much control over the correction process. While the use of LMs for spelling checking and correction is not new to this line of NLP research, our proposed approach makes better use of the rich neighbouring context, not only from before the word to be corrected, but also after it, via a dual-input deep LSTM network. Although in theory our proposed approach can be applied to any language, we carried out our experiments on Arabic, which we believe adds additional value given the fact that there are limited linguistic resources readily available in Arabic in comparison to many languages. Our experimental results demonstrate that the proposed methods are effective in both improving the quality of correction suggestions and minimising overcorrection.
Anthology ID:
2020.nlptea-1.2
Volume:
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Erhong YANG, Endong XUN, Baolin ZHANG, Gaoqi RAO
Venue:
NLP-TEA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–19
Language:
URL:
https://aclanthology.org/2020.nlptea-1.2
DOI:
Bibkey:
Cite (ACL):
Yasmin Moslem, Rejwanul Haque, and Andy Way. 2020. Arabisc: Context-Sensitive Neural Spelling Checker. In Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications, pages 11–19, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Arabisc: Context-Sensitive Neural Spelling Checker (Moslem et al., NLP-TEA 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2020.nlptea-1.2.pdf
Code
 ymoslem/Arabisc