English Language Spelling Correction as an Information Retrieval Task Using Wikipedia Search Statistics

Kyle Goslin, Markus Hofmann


Abstract
Spelling correction utilities have become commonplace during the writing process, however, many spelling correction utilities suffer due to the size and quality of dictionaries available to aid correction. Many terms, acronyms, and morphological variations of terms are often missing, leaving potential spelling errors unidentified and potentially uncorrected. This research describes the implementation of WikiSpell, a dynamic spelling correction tool that relies on the Wikipedia dataset search API functionality as the sole source of knowledge to aid misspelled term identification and automatic replacement. Instead of a traditional matching process to select candidate replacement terms, the replacement process is treated as a natural language information retrieval process harnessing wildcard string matching and search result statistics. The aims of this research include: 1) the implementation of a spelling correction algorithm that utilizes the wildcard operators in the Wikipedia dataset search API, 2) a review of the current spell correction tools and approaches being utilized, and 3) testing and validation of the developed algorithm against the benchmark spelling correction tool, Hunspell. The key contribution of this research is a robust, dynamic information retrieval-based spelling correction algorithm that does not require prior training. Results of this research show that the proposed spelling correction algorithm, WikiSpell, achieved comparable results to an industry-standard spelling correction algorithm, Hunspell.
Anthology ID:
2022.lrec-1.48
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
458–464
Language:
URL:
https://aclanthology.org/2022.lrec-1.48
DOI:
Bibkey:
Cite (ACL):
Kyle Goslin and Markus Hofmann. 2022. English Language Spelling Correction as an Information Retrieval Task Using Wikipedia Search Statistics. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 458–464, Marseille, France. European Language Resources Association.
Cite (Informal):
English Language Spelling Correction as an Information Retrieval Task Using Wikipedia Search Statistics (Goslin & Hofmann, LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.lrec-1.48.pdf