Tsz Ting Chung
2025
The Stochastic Parrot on LLM’s Shoulder: A Summative Assessment of Physical Concept Understanding
Mo Yu
|
Lemao Liu
|
Junjie Wu
|
Tsz Ting Chung
|
Shunchi Zhang
|
Jiangnan Li
|
Dit-Yan Yeung
|
Jie Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
In a systematic way, we investigate a widely asked question: Do LLMs really understand what they say?, which relates to the more familiar term Stochastic Parrot. To this end, we propose a summative assessment over a carefully designed physical concept understanding task, P HYSI C O. Our task alleviates the memorization issue via the usage of grid-format inputs that abstractly describe physical phenomena. The grids represents varying levels of understanding, from the core phenomenon, application examples to analogies to other abstract patterns in the grid world. A comprehensive study on our task demonstrates: (1) state-of-the-art LLMs, including GPT-4o, o1 and Gemini 2.0 flash thinking, lag behind humans by ∼40%; (2) the stochastic parrot phenomenon is present in LLMs, as they fail on our grid task but can describe and recognize the same concepts well in natural language; (3) our task challenges the LLMs due to intrinsic difficulties rather than the unfamiliar grid format, as in-context learning and fine-tuning on same formatted data added little to their performance.
2024
Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability
Tsz Ting Chung
|
Leyang Cui
|
Lemao Liu
|
Xinting Huang
|
Shuming Shi
|
Dit-Yan Yeung
Findings of the Association for Computational Linguistics: EMNLP 2024
Large Language Models (LLMs) have demonstrated impressive capabilities in a wide range of natural language processing tasks when leveraging in-context learning. To mitigate the additional computational and financial costs associated with in-context learning, several prompt compression methods have been proposed to compress the in-context learning prompts. Despite their success, these methods face challenges with transferability due to model-specific compression, or rely on external training data, such as GPT-4. In this paper, we investigate the ability of LLMs to develop a unified compression method that discretizes uninformative tokens, utilizing a self-supervised pre-training technique. By introducing a small number of parameters during the continual pre-training, the proposed Selection-p produces a probability for each input token, indicating whether to preserve or discard it. Experiments show Selection-p achieves state-of-the-art performance across numerous classification tasks, achieving compression rates of up to 10 times while experiencing only a marginal 0.8% decrease in performance. Moreover, it exhibits superior transferability to different models compared to prior work. Additionally, we further analyze how Selection-p helps maintain performance on in-context learning with long contexts.
Search
Fix data
Co-authors
- Lemao Liu 2
- Dit-Yan Yeung 2
- Leyang Cui 1
- Xinting Huang 1
- Jiangnan Li 1
- show all...