CrosSing: Cross-Scale Reasoning Evaluation on LLMs against Humans

Qi Han, Yifan Wu, Marten Van Schijndel


Abstract
While many studies have shown LLMs perform well in various reasoning tasks, few have examined their capacity on semantic reasoning tasks. As LLMs reason with language, it is crucial to understand how well they grasp and use the underlying scalar relationships in language. In this study, we introduced a new dataset CrosSing (Cross-Scale reasoning), providing a human baseline against which to evaluate LLMs’ ability to reason across lexical scales in gradable adjectives. We further probed how their understanding is influenced by overinformative contexts. We evaluated ten high-performing LLMs and found that some outperformed humans when no extra information was provided, but that LLM performance declined in certain overinformative contexts while human performance improved significantly. This contrast reveals a fundamental difference between recent LLMs and humans in understanding adjectives’ scalar relationships and how such understanding behaves in overinformative contexts.
Anthology ID:
2026.scil-main.36
Volume:
Proceedings of the Society for Computation in Linguistics 2026
Month:
July
Year:
2026
Address:
San Diego, CA
Editors:
Rob Voigt, Alex Warstadt, Naomi Feldman, Tal Linzen
Venues:
SCiL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
379–407
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.36/
DOI:
Bibkey:
Cite (ACL):
Qi Han, Yifan Wu, and Marten Van Schijndel. 2026. CrosSing: Cross-Scale Reasoning Evaluation on LLMs against Humans. In Proceedings of the Society for Computation in Linguistics 2026, pages 379–407, San Diego, CA. Association for Computational Linguistics.
Cite (Informal):
CrosSing: Cross-Scale Reasoning Evaluation on LLMs against Humans (Han et al., SCiL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.36.pdf