Chaoli Zhang
2025
The Illusion of Randomness: How LLMs Fail to Emulate Stochastic Decision-Making in Rock-Paper-Scissors Games?
Zihao Guo
|
Hongtao Lv
|
Chaoli Zhang
|
Yibowen Zhao
|
Yixin Zhang
|
Lizhen Cui
Findings of the Association for Computational Linguistics: EMNLP 2025
Prior research indicates that although large language models (LLMs) can precisely articulate the theoretical probability distributions associated with optimal strategic choices, their actual decision-making systematically diverges from these prescriptions—a phenomenon we define as the cognition–behaviour gap in LLMs. For example, in a Rock–Paper–Scissors (RPS) game, LLMs correctly identify the strategy of Nash equilibrium as selecting each action (Rock, Paper, Scissors) with equal probability 1⁄3, but their observed choices systematically deviate from this uniform distribution. Through a comprehensive evaluation of 20 state-of-the-art LLMs, we identify two critical insights: (1) we demonstrate that intrinsic biases inherited from pre-training corpora alone are insufficient to explain the observed deviations; (2) we introduce a semantic-free paradigm that strips away intrinsic biases to isolate pure positional bias-LLMs exhibit distinct position preferences—for example, o1 favours the first option, DeepSeek-V3 peaks the middle and DeepSeek-R1 shows a bimodal bias toward first and last positions. Our findings advocate innovation to bridge the gap between strategic reasoning and decision-making in LLMs.
2023
LogiCoT: Logical Chain-of-Thought Instruction Tuning
Hanmeng Liu
|
Zhiyang Teng
|
Leyang Cui
|
Chaoli Zhang
|
Qiji Zhou
|
Yue Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023
Generative Pre-trained Transformer 4 (GPT-4) demonstrates impressive chain-of-thought reasoning ability. Recent work on self-instruction tuning, such as Alpaca, has focused on enhancing the general proficiency of models. These instructions enable the model to achieve performance comparable to GPT-3.5 on general tasks like open-domain text generation and paraphrasing. However, they fall short of helping the model handle complex reasoning tasks. To bridge the gap, this paper presents LogiCoT, a new instruction-tuning dataset for Logical Chain-of-Thought reasoning with GPT-4. We elaborate on the process of harvesting instructions for prompting GPT-4 to generate chain-of-thought rationales. LogiCoT serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills.
Search
Fix author
Co-authors
- Leyang Cui 1
- Lizhen Cui 1
- Zihao Guo 1
- Hanmeng Liu (刘汉蒙) 1
- Hongtao Lv 1
- show all...