Shinnosuke Isono

2026

An Existence Proof for Neural Language Models That Can Explain Garden-Path Effects via Surprisal
Ryo Yoshida | Shinnosuke Isono | Taiga Someya | Yohei Oseki | Tatsuki Kuribayashi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Surprisal theory hypothesizes that the difficulty of human sentence processing increases linearly with surprisal, the negative log-probability of a word given its context. Computational psycholinguistics has tested this hypothesis using language models (LMs) as proxies for human prediction. While surprisal derived from recent neural LMs generally captures human processing difficulty on naturalistic corpora that predominantly consist of simple sentences, it severely underestimates processing difficulty on sentences that require syntactic disambiguation (garden-path effects). This leads to the claim that the processing difficulty of such sentences cannot be reduced to surprisal, although it remains possible that neural LMs simply differ from humans in next-word prediction. In this paper, we investigate whether it is truly impossible to construct a neural LM that can explain garden-path effects via surprisal. Specifically, instead of evaluating off-the-shelf neural LMs, we fine-tune these LMs on garden-path sentences so as to better align surprisal-based reading-time estimates with actual human reading times. Our results show that fine-tuned LMs do not overfit and successfully capture human reading slowdowns on held-out garden-path items; they even improve predictive power for human reading times on naturalistic corpora and preserve their general LM capabilities. These results provide an existence proof for a neural LM that can explain both garden-path effects and naturalistic reading times via surprisal, but also raise a theoretical question: what kind of evidence can truly falsify surprisal theory?

pdf bib abs

Timesteps of Mamba Align with Human Reading Times
Yuji Yamamoto | Shinnosuke Isono | Yoshinobu Kawahara | Sho Yokoi
Findings of the Association for Computational Linguistics: ACL 2026

This study demonstrates an alignment of per-word processing time in a popular state-space language model Mamba and human readers. In Mamba, the recurrent state transition at each layer conceptually takes some duration of time, the discretization timestep 𝛥_t, determined dynamically in response to the input. Using a naturalistic reading dataset, we show that the per-word timestep from Mamba is a powerful predictor of human reading times, comparable to strong baselines such as word frequency and GPT-2 surprisal and significant even when they are controlled for. We further suggest, through formal analysis of Mamba’s architecture and internal dynamics, that Mamba can serve as a new, valuable lens to look at human real-time language processing with ever-updated memory, because it allows us to look at how each module (layer) weighs short- and long-term information retention, and how noise may interact with dynamic, continuous memory representation. Code is available via an (anonymized) link.

2025

pdf bib abs

If Attention Serves as a Cognitive Model of Human Memory Retrieval, What is the Plausible Memory Representation?
Ryo Yoshida | Shinnosuke Isono | Kohei Kajikawa | Taiga Someya | Yushi Sugimoto | Yohei Oseki
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent work in computational psycholinguistics has revealed intriguing parallels between attention mechanisms and human memory retrieval, focusing primarily on vanilla Transformers that operate on token-level representations. However, computational psycholinguistic research has also established that syntactic structures provide compelling explanations for human sentence processing that token-level factors cannot fully account for. In this paper, we investigate whether the attention mechanism of Transformer Grammar (TG), which uniquely operates on syntactic structures as representational units, can serve as a cognitive model of human memory retrieval, using Normalized Attention Entropy (NAE) as a linking hypothesis between models and humans. Our experiments demonstrate that TG’s attention achieves superior predictive power for self-paced reading times compared to vanilla Transformer’s, with further analyses revealing independent contributions from both models. These findings suggest that human sentence processing involves dual memory representations—one based on syntactic structures and another on token sequences—with attention serving as the general memory retrieval algorithm, while highlighting the importance of incorporating syntactic structures as representational units.

Co-authors

Tatsuki Kuribayashi 1

Yushi Sugimoto 1

Yuji Yamamoto 1

Sho Yokoi 1

Venues

ACL2
Findings1

Fix author