Kohei Kajikawa
2026
Syntactically-guided Information Maintenance in Sentence Comprehension
Shinnosuke Isono | Kohei Kajikawa
Proceedings of the 30th Conference on Computational Natural Language Learning
Shinnosuke Isono | Kohei Kajikawa
Proceedings of the 30th Conference on Computational Natural Language Learning
Maintaining information in context is essential in successful real-time language comprehension, but maintenance is cognitively costly and can slow processing. We hypothesize that rational language users selectively maintain information that is crucial for future prediction, guided by syntactic structure. Under this view, two factors affect maintenance cost: the number of predicted heads and the number of incomplete dependencies. Although these factors have been treated as competing hypotheses in the literature, our account predicts that they are not reducible to one another. We show this is the case in a naturalistic reading time dataset in Japanese, a language in which the two factors contrast particularly clearly. We further show that there is a tradeoff such that readers that slow down for maintenance tend to benefit more from predictability, providing additional support for the proposed account. These patterns are not evident in English, however, and we highlight some issues to be resolved to understand the contribution of syntax in memory-efficient processing of various languages.
Information-Theoretic Storage Cost in Sentence Comprehension
Kohei Kajikawa | Shinnosuke Isono | Ethan Gotlieb Wilcox
Proceedings of the 30th Conference on Computational Natural Language Learning
Kohei Kajikawa | Shinnosuke Isono | Ethan Gotlieb Wilcox
Proceedings of the 30th Conference on Computational Natural Language Learning
Real-time sentence comprehension imposes a significant load on working memory, as comprehenders must maintain contextual information to anticipate future input. While measures of such load have played an important role in psycholinguistic theories, they have largely been formalized using symbolic grammars, which assign discrete, uniform costs to syntactic predictions. This study proposes a measure of processing storage cost based on an information-theoretic formalization, as the amount of information previous words carry about future context, under uncertainty. Unlike previous discrete, grammar-based metrics, this measure is continuous, probabilistic, theory-neutral, and can be estimated from pre-trained neural language models. The validity of this approach is demonstrated through three analyses in English: our measure (i) recovers well-known processing asymmetries in center embeddings and relative clauses, (ii) correlates with a grammar-based storage cost in a syntactically-annotated corpus, and (iii) predicts reading-time variance in two large-scale naturalistic datasets over and above baseline models with traditional information-based predictors. Our code is available at https://github.com/kohei-kaji/info-storage.
2025
If Attention Serves as a Cognitive Model of Human Memory Retrieval, What is the Plausible Memory Representation?
Ryo Yoshida | Shinnosuke Isono | Kohei Kajikawa | Taiga Someya | Yushi Sugimoto | Yohei Oseki
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ryo Yoshida | Shinnosuke Isono | Kohei Kajikawa | Taiga Someya | Yushi Sugimoto | Yohei Oseki
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent work in computational psycholinguistics has revealed intriguing parallels between attention mechanisms and human memory retrieval, focusing primarily on vanilla Transformers that operate on token-level representations. However, computational psycholinguistic research has also established that syntactic structures provide compelling explanations for human sentence processing that token-level factors cannot fully account for. In this paper, we investigate whether the attention mechanism of Transformer Grammar (TG), which uniquely operates on syntactic structures as representational units, can serve as a cognitive model of human memory retrieval, using Normalized Attention Entropy (NAE) as a linking hypothesis between models and humans. Our experiments demonstrate that TG’s attention achieves superior predictive power for self-paced reading times compared to vanilla Transformer’s, with further analyses revealing independent contributions from both models. These findings suggest that human sentence processing involves dual memory representations—one based on syntactic structures and another on token sequences—with attention serving as the general memory retrieval algorithm, while highlighting the importance of incorporating syntactic structures as representational units.
2024
Is Structure Dependence Shaped for Efficient Communication?: A Case Study on Coordination
Kohei Kajikawa | Yusuke Kubota | Yohei Oseki
Proceedings of the 28th Conference on Computational Natural Language Learning
Kohei Kajikawa | Yusuke Kubota | Yohei Oseki
Proceedings of the 28th Conference on Computational Natural Language Learning
Natural language exhibits various universal properties.But why do these universals exist?One explanation is that they arise from functional pressures to achieve efficient communication, a view which attributes cross-linguistic properties to domain-general cognitive abilities.This hypothesis has successfully addressed some syntactic universal properties such as compositionality and Greenbergian word order universals.However, more abstract syntactic universals have not been explored from the perspective of efficient communication.Among such universals, the most notable one is structure dependence, that is, grammar-internal operations crucially depend on hierarchical representations.This property has traditionally been taken to be central to natural language and to involve domain-specific knowledge irreducible to communicative efficiency. In this paper, we challenge the conventional view by investigating whether structure dependence realizes efficient communication, focusing on coordinate structures.We design three types of artificial languages: (i) one with a structure-dependent reduction operation, which is similar to natural language, (ii) one without any reduction operations, and (iii) one with a linear (rather than structure-dependent) reduction operation.We quantify the communicative efficiency of these languages.The results demonstrate that the language with the structure-dependent reduction operation is significantly more communicatively efficient than the counterfactual languages.This suggests that the existence of structure-dependent properties can be explained from the perspective of efficient communication.