Abstract
This study demonstrates a weakness in how n-gram and PCFG surprisal are used to predict reading times in eye-tracking data. In particular, the information conveyed by words skipped during saccades is not usually included in the surprisal measures. This study shows that correcting the surprisal calculation improves n-gram surprisal and that upcoming n-grams affect reading times, replicating previous findings of how lexical frequencies affect reading times. In contrast, the predictivity of PCFG surprisal does not benefit from the surprisal correction despite the fact that lexical sequences skipped by saccades are processed by readers, as demonstrated by the corrected n-gram measure. These results raise questions about the formulation of information-theoretic measures of syntactic processing such as PCFG surprisal and entropy reduction when applied to reading times.- Anthology ID:
- W16-4104
- Volume:
- Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Dominique Brunato, Felice Dell’Orletta, Giulia Venturi, Thomas François, Philippe Blache
- Venue:
- CL4LC
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 32–37
- Language:
- URL:
- https://aclanthology.org/W16-4104
- DOI:
- Cite (ACL):
- Marten van Schijndel and William Schuler. 2016. Addressing surprisal deficiencies in reading time models. In Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), pages 32–37, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Addressing surprisal deficiencies in reading time models (van Schijndel & Schuler, CL4LC 2016)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/W16-4104.pdf