Do Language Models Understand the Cognitive Tasks Given to Them? Investigations with the N-Back Paradigm

Xiaoyang Hu; Richard L. Lewis

doi:10.18653/v1/2025.findings-acl.136

Do Language Models Understand the Cognitive Tasks Given to Them? Investigations with the N-Back Paradigm

Abstract

Cognitive tasks originally developed for humans are now increasingly used to study language models. While applying these tasks is often straightforward, interpreting their results can be challenging. In particular, when a model underperforms, it is often unclear whether this results from a limitation in the cognitive ability being tested or a failure to understand the task itself. A recent study argues that GPT 3.5’s declining performance on 2-back and 3-back tasks reflects a working memory capacity limit similar to humans (Gong et al., 2024). By analyzing a range of open-source language models of varying performance levels on these tasks, we show that the poor performance is due at least in part to a limitation in task comprehension and task set maintenance. We challenge the best-performing model with progressively harder versions of the task (up to 10-back) and experiment with alternative prompting strategies, before analyzing model attentions. Our larger aim is to contribute to the ongoing conversation around refining methodologies for the cognitive evaluation of language models.

Anthology ID:: 2025.findings-acl.136
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2665–2677
Language:
URL:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.136/
DOI:: 10.18653/v1/2025.findings-acl.136
Bibkey:
Cite (ACL):: Xiaoyang Hu and Richard Lewis. 2025. Do Language Models Understand the Cognitive Tasks Given to Them? Investigations with the N-Back Paradigm. In Findings of the Association for Computational Linguistics: ACL 2025, pages 2665–2677, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Do Language Models Understand the Cognitive Tasks Given to Them? Investigations with the N-Back Paradigm (Hu & Lewis, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.136.pdf

PDF Cite Search Fix data