2024
pdf
abs
Using Language Models to Unravel Semantic Development in Children’s Use of Perception Verbs
Bram van Dijk
|
Max J. van Duijn
|
Li Kloostra
|
Marco Spruit
|
Barend Beekhuizen
Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024
In this short paper we employ a Language Model (LM) to gain insight into how complex semantics of a Perception Verb (PV) emerge in children. Using a Dutch LM as representation of mature language use, we find that for all ages 1) the LM accurately predicts PV use in children’s freely-told narratives; 2) children’s PV use is close to mature use; 3) complex PV meanings with attentional and cognitive aspects can be found. Our approach illustrates how LMs can be meaningfully employed in studying language development, hence takes a constructive position in the debate on the relevance of LMs in this context.
pdf
abs
The Curious Case of Representational Alignment: Unravelling Visio-Linguistic Tasks in Emergent Communication
Tom Kouwenhoven
|
Max Peeperkorn
|
Bram Van Dijk
|
Tessa Verhoef
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Natural language has the universal properties of being compositional and grounded in reality. The emergence of linguistic properties is often investigated through simulations of emergent communication in referential games. However, these experiments have yielded mixed results compared to similar experiments addressing linguistic properties of human language. Here we address representational alignment as a potential contributing factor to these results. Specifically, we assess the representational alignment between agent image representations and between agent representations and input images. Doing so, we confirm that the emergent language does not appear to encode human-like conceptual visual features, since agent image representations drift away from inputs whilst inter-agent alignment increases. We moreover identify a strong relationship between inter-agent alignment and topographic similarity, a common metric for compositionality, and address its consequences. To address these issues, we introduce an alignment penalty that prevents representational drift but interestingly does not improve performance on a compositional discrimination task. Together, our findings emphasise the key role representational alignment plays in simulations of language emergence.
2023
pdf
abs
Theory of Mind in Freely-Told Children’s Narratives: A Classification Approach
Bram van Dijk
|
Marco Spruit
|
Max van Duijn
Findings of the Association for Computational Linguistics: ACL 2023
Children are the focal point for studying the link between language and Theory of Mind (ToM) competence. Language and ToM are often studied with younger children and standardized tests, but as both are social competences, data and methods with higher ecological validity are critical. We leverage a corpus of 442 freely-told stories by Dutch children aged 4-12, recorded in their everyday classroom environments, to study language and ToM with NLP-tools. We labelled stories according to the mental depth of story characters children create, as a proxy for their ToM competence ‘in action’, and built a classifier with features encoding linguistic competences identified in existing work as predictive of ToM.We obtain good and fairly robust results (F1-macro = .71), relative to the complexity of the task for humans. Our results are explainable in that we link specific linguistic features such as lexical complexity and sentential complementation, that are relatively independent of children’s ages, to higher levels of character depth. This confirms and extends earlier work, as our study includes older children and socially embedded data from a different domain. Overall, our results support the idea that language and ToM are strongly interlinked, and that in narratives the former can scaffold the latter.
pdf
abs
ChiSCor: A Corpus of Freely-Told Fantasy Stories by Dutch Children for Computational Linguistics and Cognitive Science
Bram van Dijk
|
Max van Duijn
|
Suzan Verberne
|
Marco Spruit
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)
In this resource paper we release ChiSCor, a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character perspectives, and unravelling language and cognition in development, with computational tools. Unlike existing resources, ChiSCor’s stories were produced in natural contexts, in line with recent calls for more ecologically valid datasets. ChiSCor hosts text, audio, and annotations for character complexity and linguistic complexity. Additional metadata (e.g. education of caregivers) is available for one third of the Dutch children. ChiSCor also includes a small set of 62 English stories. This paper details how ChiSCor was compiled and shows its potential for future work with three brief case studies: i) we show that the syntactic complexity of stories is strikingly stable across children’s ages; ii) we extend work on Zipfian distributions in free speech and show that ChiSCor obeys Zipf’s law closely, reflecting its social context; iii) we show that even though ChiSCor is relatively small, the corpus is rich enough to train informative lemma vectors that allow us to analyse children’s language use. We end with a reflection on the value of narrative datasets in computational linguistics.
pdf
abs
Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests
Max van Duijn
|
Bram van Dijk
|
Tom Kouwenhoven
|
Werner de Valk
|
Marco Spruit
|
Peter van der Putten
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)
To what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this emerging debate by (i) testing 11 base- and instruction-tuned LLMs on capabilities relevant to ToM beyond the dominant false-belief paradigm, including non-literal language usage and recursive intentionality; (ii) using newly rewritten versions of standardized tests to gauge LLMs’ robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks. We find that instruction-tuned LLMs from the GPT family outperform other models, and often also children. Base-LLMs are mostly unable to solve ToM tasks, even with specialized prompting. We suggest that the interlinked evolution and development of language and ToM may help explain what instruction-tuning adds: rewarding cooperative communication that takes into account interlocutor and context. We conclude by arguing for a nuanced perspective on ToM in LLMs.
pdf
abs
Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding
Bram van Dijk
|
Tom Kouwenhoven
|
Marco Spruit
|
Max Johannes van Duijn
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistical patterns in the training data; ii) that LLMs master formal but not functional language competence; and iii) that language learning in LLMs cannot inform human language learning. Drawing on empirical and theoretical arguments, we show that these points need more nuance. Second, we outline a pragmatic perspective on the issue of ‘real’ understanding and intentionality in LLMs. Understanding and intentionality pertain to unobservable mental states we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behaviour effectively. We reflect on the circumstances under which it would make sense for humans to similarly attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society.
2022
pdf
abs
Looking from the Inside: How Children Render Character’s Perspectives in Freely Told Fantasy Stories
Max van Duijn
|
Bram van Dijk
|
Marco Spruit
Proceedings of the 4th Workshop of Narrative Understanding (WNU2022)
Story characters not only perform actions, they typically also perceive, feel, think, and communicate. Here we are interested in how children render characters’ perspectives when freely telling a fantasy story. Drawing on a sample of 150 narratives elicited from Dutch children aged 4-12, we provide an inventory of 750 instances of character-perspective representation (CPR), distinguishing fourteen different types. Firstly, we observe that character perspectives are ubiquitous in freely told children’s stories and take more varied forms than traditional frameworks can accommodate. Secondly, we discuss variation in the use of different types of CPR across age groups, finding that character perspectives are being fleshed out in more advanced and diverse ways as children grow older. Thirdly, we explore whether such variation can be meaningfully linked to automatically extracted linguistic features, thereby probing the potential for using automated tools from NLP to extract and classify character perspectives in children’s stories.