Current approaches in automatic readability assessment have found success with the use of large language models and transformer architectures. These techniques lead to accuracy improvement, but they do not offer the interpretability that is uniquely required by the audience most often employing readability assessment tools: teachers and educators. Recent work that employs more traditional machine learning methods has highlighted the linguistic importance of considering semantic and syntactic characteristics of text in readability assessment by utilizing handcrafted feature sets. Research in Education suggests that, in addition to semantics and syntax, phonetic and orthographic instruction are necessary for children to progress through the stages of reading and spelling development; children must first learn to decode the letters and symbols on a page to recognize words and phonemes and their connection to speech sounds. Here, we incorporate this word-level phonemic decoding process into readability assessment by crafting a phonetically-based feature set for grade-level classification for English. Our resulting feature set shows comparable performance to much larger, semantically- and syntactically-based feature sets, supporting the linguistic value of orthographic and phonetic considerations in readability assessment.
Given the more widespread nature of natural language interfaces, it is increasingly important to understand who are accessing those interfaces, and how those interfaces are being used. In this paper, we explore spellchecking in the context of web search with children as the target audience. In particular, via a literature review we show that, while widely used, popular search tools are ill-designed for children. We then use spellcheckers as a case study to highlight the need for an interdisciplinary approach that brings together natural language processing, education, human-computer interaction to address a known information retrieval problem: query misspelling. We conclude that it is imperative that those for whom the interfaces are designed have a voice in the design process.
For help with their spelling errors, children often turn to spellcheckers integrated in software applications like word processors and search engines. However, existing spellcheckers are usually tuned to the needs of traditional users (i.e., adults) and generally prove unsatisfactory for children. Motivated by this issue, we introduce KidSpell, an English spellchecker oriented to the spelling needs of children. KidSpell applies (i) an encoding strategy for mapping both misspelled words and spelling suggestions to their phonetic keys and (ii) a selection process that prioritizes candidate spelling suggestions that closely align with the misspelled word based on their respective keys. To assess the effectiveness of, we compare the model’s performance against several popular, mainstream spellcheckers in a number of offline experiments using existing and novel datasets. The results of these experiments show that KidSpell outperforms existing spellcheckers, as it accurately prioritizes relevant spelling corrections when handling misspellings generated by children in both essay writing and online search tasks. As a byproduct of our study, we create two new datasets comprised of spelling errors generated by children from hand-written essays and web search inquiries, which we make available to the research community.