Li Kloostra


2026

We present a novel language resource that combines a reading-time corpus, constructed in psycholinguistics, with rich lexical, compositional, and discourse meaning representation annotations. While existing psycholinguistic corpora typically provide morphological and syntactic annotations, no comparable corpora with comprehensive semantic information have been made available until now. We enriched the UCL corpus (361 sentences of self-paced reading, eye-tracking, and EEG data) with annotations in the style of the Parallel Meaning Bank (PMB) project, including WordNet synsets, VerbNet thematic roles, Combinatory Categorial Grammar (CCG) parses, and Discourse Representation Theory (DRT) structures. We demonstrate the utility of this resource through two case studies examining (1) encoding interference effects due to gender similarity and (2) integration costs in semantic role assignment. Both studies reveal processing patterns consistent with established psycholinguistic theories and/or previous findings. This resource fills a significant gap in psycholinguistic research, enabling the evaluation of semantic processing theories on naturalistic corpus data and extending the existing pool of annotated reading-time corpora. It should be useful to psycholinguists, as well as to cognitive scientists interested in language processing.

2024

In this short paper we employ a Language Model (LM) to gain insight into how complex semantics of a Perception Verb (PV) emerge in children. Using a Dutch LM as representation of mature language use, we find that for all ages 1) the LM accurately predicts PV use in children’s freely-told narratives; 2) children’s PV use is close to mature use; 3) complex PV meanings with attentional and cognitive aspects can be found. Our approach illustrates how LMs can be meaningfully employed in studying language development, hence takes a constructive position in the debate on the relevance of LMs in this context.