Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models

Cristiano Ciaccio; Marta Sartor; Alessio Miaschi; Felice Dell’Orletta

Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models

Cristiano Ciaccio, Marta Sartor, Alessio Miaschi, Felice Dell’Orletta

Abstract

Correctly identifying characters and substrings of words should be a basic but essential ability of any Language Model that aims to proficiently understand and produce language. Despite so, the majority of Pre-trained Language Models (PLMs) are “character-blind” and struggle in spelling tasks, although they still seem to acquire some character knowledge during pre-training, a phenomenon dubbed Spelling Miracle. To shed light on this phenomenon, we systematically evaluate a range of PLMs with different parameter sizes using a controlled binary substring identification task. Through a series of experiments, we propose the first comprehensive investigation on where, when, and how a PLMs develop awareness of characters and substrings, with a particular linguistic focus on morphemic units such as prefixes, suffixes, and roots.

Anthology ID:: 2025.findings-acl.593
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11361–11372
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.593/
DOI:
Bibkey:
Cite (ACL):: Cristiano Ciaccio, Marta Sartor, Alessio Miaschi, and Felice Dell’Orletta. 2025. Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 11361–11372, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models (Ciaccio et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.593.pdf

PDF Cite Search Fix data