Yifan Wu

Other people with similar names: Yifan Wu, Yifan Wu

Unverified author pages with similar names: Yifan Wu


2026

While many studies have shown LLMs perform well in various reasoning tasks, few have examined their capacity on semantic reasoning tasks. As LLMs reason with language, it is crucial to understand how well they grasp and use the underlying scalar relationships in language. In this study, we introduced a new dataset CrosSing (Cross-Scale reasoning), providing a human baseline against which to evaluate LLMs’ ability to reason across lexical scales in gradable adjectives. We further probed how their understanding is influenced by overinformative contexts. We evaluated ten high-performing LLMs and found that some outperformed humans when no extra information was provided, but that LLM performance declined in certain overinformative contexts while human performance improved significantly. This contrast reveals a fundamental difference between recent LLMs and humans in understanding adjectives’ scalar relationships and how such understanding behaves in overinformative contexts.