Yinqi Zhang
2026
A Survey of Inductive Reasoning for Large Language Models
Kedi Chen | Dezhao Ruan | Yuhao Dan | Yaoting Wang | Siyu Yan | Xuecheng Wu | Yinqi Zhang | Qin Chen | Jie Zhou | Liang He | Biqing Qi | Linyang Li | Qipeng Guo | Xiaoming Shi | Wei Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Kedi Chen | Dezhao Ruan | Yuhao Dan | Yaoting Wang | Siyu Yan | Xuecheng Wu | Yinqi Zhang | Qin Chen | Jie Zhou | Liang He | Biqing Qi | Linyang Li | Qipeng Guo | Xiaoming Shi | Wei Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reasoning is an important task for large language models (LLMs). Among all the reasoning paradigms, inductive reasoning is one of the basic types, which is characterized by its particular-to-general thinking process and the non-uniqueness of its answers. The inductive mode is crucial for knowledge generalization and aligns better with human cognition, so it is a fundamental mode of learning, hence attracting increasing interest. Despite the importance of inductive reasoning, there is no systematic summary of it. Therefore, this paper presents the first comprehensive survey of inductive reasoning for LLMs. First, methods for improving inductive reasoning are categorized into three main areas: post-training enhancement, test-time exploration, and data augmentation. Then, current benchmarks of inductive reasoning are summarized, and a unified sandbox-based evaluation approach with the observation coverage metric is derived. Finally, we offer some analyses regarding the source of inductive ability and how simple model architectures and data help with inductive tasks, providing a solid foundation for future research.
2025
Complete Chess Games Enable LLM Become A Chess Master
Yinqi Zhang | Xintian Han | Haolong Li | Kedi Chen | Shaohui Lin
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Yinqi Zhang | Xintian Han | Haolong Li | Kedi Chen | Shaohui Lin
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Large language models (LLM) have shown remarkable abilities in text generation, question answering, language translation, reasoning and many other tasks. It continues to advance rapidly and is becoming increasingly influential in various fields, from technology and business to education and entertainment. Despite LLM’s success in multiple areas, its ability to play abstract games, such as chess, is underexplored. Chess-playing requires the language models to output legal and reasonable moves from textual inputs. Here, we propose the Large language model ChessLLM to play full chess games. We transform the game into a textual format with the best move represented in the Forsyth-Edwards Notation. We show that by simply supervised fine-tuning, our model has achieved a professional-level Elo rating of 1788 in matches against the standard Elo-rated Stockfish when permitted to sample 10 times. We further show that data quality is important. Long-round data supervision enjoys a 350 Elo rating improvement over short-round data.
2024
Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data
Haolong Li | Yu Ma | Yinqi Zhang | Chen Ye | Jie Chen
Findings of the Association for Computational Linguistics: ACL 2024
Haolong Li | Yu Ma | Yinqi Zhang | Chen Ye | Jie Chen
Findings of the Association for Computational Linguistics: ACL 2024
While large language models (LLMs) have shown excellent capabilities in language understanding, text generation and many other tasks, they still struggle in complex multi-step reasoning problems such as mathematical reasoning. In this paper, through a newly proposed arithmetical puzzle problem, we show that the model can perform well on multi-step reasoning tasks via fine tuning on high-quality synthetic data. Experiments with the open-llama-3B model on three different test datasets show that not only the model can reach a zero-shot pass@1 at 0.44 on the in-domain dataset, it also demonstrates certain generalization capabilities on the out-of-domain datasets. Specifically, this paper has designed two out-of-domain datasets in the form of extending the numerical range and the composing components of the arithmetical puzzle problem separately. The fine-tuned model have shown encouraging performance on these two far more difficult tasks with the zero-shot pass@1 at 0.33 and 0.35 correspondingly.