Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives

Ruchira Dhar, Qiwei Peng, Anders Søgaard


Abstract
Compositionality is considered central to language abilities. As performant language systems, how do large language models (LLMs) do on compositional tasks? We evaluate adjective–noun compositionality in LLMs using two complementary setups: prompt-based functional assessment and a representational analysis of internal model states. Our results reveal a striking divergence between task performance and internal states. While LLMs reliably develop compositional representations, they fail to translate consistently into functional task success across model variants. Consequently, we highlight the importance of contrastive evaluation for obtaining a more complete understanding of model capabilities.
Anthology ID:
2026.starsem-conference.8
Volume:
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Saif M. Mohammad, Nedjma Ousidhoum
Venues:
*SEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
125–135
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.starsem-conference.8/
DOI:
Bibkey:
Cite (ACL):
Ruchira Dhar, Qiwei Peng, and Anders Søgaard. 2026. Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives. In Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026), pages 125–135, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives (Dhar et al., *SEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.starsem-conference.8.pdf