On the scaling relationship between cloze probabilities and language model next-token prediction

Cassandra L Jacobs, Morgan Grobol


Abstract
Recent work has shown that larger language models have better predictive power for eye movement and reading time data. However, we know less about how model capacity relates to human production statistics in the cloze task, which are used to predict reading times as well. While even the best models under-allocate probability mass to human responses, larger models assign higher-quality estimates of next tokens and their likelihood of production in cloze data because they are less sensitive to lexical co-occurrence statistics while being better aligned semantically to human cloze responses. The results provide support for the claim that the greater memorization capacity of larger models helps them guess more semantically appropriate words, but makes them less sensitive to low-level information that is relevant for word recognition.
Anthology ID:
2026.conll-main.32
Volume:
Proceedings of the 30th Conference on Computational Natural Language Learning
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Claire Bonial, Yevgeni Berzak
Venues:
CoNLL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
544–554
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.32/
DOI:
Bibkey:
Cite (ACL):
Cassandra L Jacobs and Morgan Grobol. 2026. On the scaling relationship between cloze probabilities and language model next-token prediction. In Proceedings of the 30th Conference on Computational Natural Language Learning, pages 544–554, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
On the scaling relationship between cloze probabilities and language model next-token prediction (Jacobs & Grobol, CoNLL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.32.pdf