Shunchi Zhang
2025
The Stochastic Parrot on LLM’s Shoulder: A Summative Assessment of Physical Concept Understanding
Mo Yu
|
Lemao Liu
|
Junjie Wu
|
Tsz Ting Chung
|
Shunchi Zhang
|
Jiangnan Li
|
Dit-Yan Yeung
|
Jie Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
In a systematic way, we investigate a widely asked question: Do LLMs really understand what they say?, which relates to the more familiar term Stochastic Parrot. To this end, we propose a summative assessment over a carefully designed physical concept understanding task, P HYSI C O. Our task alleviates the memorization issue via the usage of grid-format inputs that abstractly describe physical phenomena. The grids represents varying levels of understanding, from the core phenomenon, application examples to analogies to other abstract patterns in the grid world. A comprehensive study on our task demonstrates: (1) state-of-the-art LLMs, including GPT-4o, o1 and Gemini 2.0 flash thinking, lag behind humans by ∼40%; (2) the stochastic parrot phenomenon is present in LLMs, as they fail on our grid task but can describe and recognize the same concepts well in natural language; (3) our task challenges the LLMs due to intrinsic difficulties rather than the unfamiliar grid format, as in-context learning and fine-tuning on same formatted data added little to their performance.
Search
Fix data
Co-authors
- Tsz Ting Chung 1
- Jiangnan Li 1
- Lemao Liu 1
- Junjie Wu 1
- Dit-Yan Yeung 1
- show all...