Alan Juffs

2025

pdf bib abs
Identifying and analyzing ‘noisy’ spelling errors in a second language corpus
Alan Juffs | Ben Naismith
Proceedings of the Tenth Workshop on Noisy and User-generated Text

This paper addresses the problem of identifying and analyzing ‘noisy’ spelling errors in texts written by second language (L2) learners’ texts in a written corpus. Using Python, spelling errors were identified in 5774 texts greater than or equal to 66 words (total=1,814,209 words), selected from a corpus of 4.2 million words (Authors-1). The statistical analysis used hurdle() models in R, which are appropriate for non-normal, count data, with many zeros.

Co-authors

Ben Naismith 1

Venues

wnut1
ws1

Fix data