CogEvolve: A Multimodal Benchmark for Evaluating Relational Reasoning in Semantic Extension

Jingjie Zeng, Huayang Li, Liang Yang, Yuanyuan Sun, Shaowu Zhang, Hongfei Lin


Abstract
Human cognition excels at extending knowledge through analogy, where word meanings evolve along structured pathways from concrete prototypes to abstract senses via metaphor and metonymy. Do Large Language Models (LLMs) internalize this generative logic, or merely mimic statistical patterns? To investigate this, we introduce CogEvolve, a cognitive linguistic benchmark designed to test these evolutionary pathways across textual and visual modalities. Our evaluation reveals a distinct cognitive profile: models function as "Super-Associators" expert at static recognition yet fail at causal reasoning. In text, they exhibit a Frequency-Primacy Conflation, confusing statistical prevalence with cognitive basicness. Crucially, this reasoning collapses further in the visual domain. We term this deficit the Ungrounded Arrow: models possess high-fidelity concept representations (the "dots") but lack the transformational operators (the "arrows") essential for true relational understanding.
Anthology ID:
2026.acl-long.1190
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25943–25960
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1190/
DOI:
Bibkey:
Cite (ACL):
Jingjie Zeng, Huayang Li, Liang Yang, Yuanyuan Sun, Shaowu Zhang, and Hongfei Lin. 2026. CogEvolve: A Multimodal Benchmark for Evaluating Relational Reasoning in Semantic Extension. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25943–25960, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
CogEvolve: A Multimodal Benchmark for Evaluating Relational Reasoning in Semantic Extension (Zeng et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1190.pdf
Checklist:
 2026.acl-long.1190.checklist.pdf