Generative Gamer: Learning Equilibrium Strategy by LLM-driven Dynamic Deduction
Yadong Zhang, Xinshu Shen, Yupei Ren, Shangqing Zhao, Man Lan
Abstract
Large Language Models (LLMs) have demonstrated remarkable general capabilities, yet they falter in domains requiring deep strategic reasoning. A primary obstacle is the need to navigate a game tree that grows exponentially with search depth, a task for which their generative nature is ill-suited. To address this, we introduce Generative Gamer (GenGamer), a framework that trains LLMs to reason like an expert player. Instead of attempting an exhaustive search, GenGamer learns to generate a compact, pruned reasoning trajectory termed as a Dynamic Deduction. This is achieved by integrating three key strategies: action pruning based on policy confidence, state pruning via value estimation, and branch pruning inspired by alpha-beta principles. Furthermore, to train the model effectively, we propose the Deduction Tree Reward (DTR), a process-oriented mechanism that provides step-by-step feedback on the quality of the reasoning process, rather than relying solely on the final game outcome. Experiments on complex games such as Tic-Tac-Toe and Leduc Poker demonstrate that GenGamer significantly enhances the strategic capabilities of LLMs, enabling them to achieve performance that surpasses current state-of-the-art language models.- Anthology ID:
- 2026.acl-long.574
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12604–12617
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.574/
- DOI:
- Cite (ACL):
- Yadong Zhang, Xinshu Shen, Yupei Ren, Shangqing Zhao, and Man Lan. 2026. Generative Gamer: Learning Equilibrium Strategy by LLM-driven Dynamic Deduction. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12604–12617, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Generative Gamer: Learning Equilibrium Strategy by LLM-driven Dynamic Deduction (Zhang et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.574.pdf