Alan Juffs
2025
Identifying and analyzing ‘noisy’ spelling errors in a second language corpus
Alan Juffs
|
Ben Naismith
Proceedings of the Tenth Workshop on Noisy and User-generated Text
This paper addresses the problem of identifying and analyzing ‘noisy’ spelling errors in texts written by second language (L2) learners’ texts in a written corpus. Using Python, spelling errors were identified in 5774 texts greater than or equal to 66 words (total=1,814,209 words), selected from a corpus of 4.2 million words (Authors-1). The statistical analysis used hurdle() models in R, which are appropriate for non-normal, count data, with many zeros.