Improving Abstract Reasoning Ability of Large Language Models through Mixture Program-based Data Synthesis

Yile Wang, Hui Huang


Abstract
"Abstract reasoning is a challenging task that involves identifying patterns from limited input-output grids and applying them to new grids. With the development of large language models(LLMs), recent studies attempt to transfer the problems to textual format and tackle abstract reasoning tasks using models such as GPT-4. However, the overall accuracy is still low, which also results in the poor quality of abstract reasoning data directly synthesized by GPT-4, making it unsuitable as effective fine-tuning data. In this paper, we propose mixture program-based data synthesis strategies, including low-level code-based synthesis, high-level DSL-based synthesis,and shuffle-based synthesis. Through these strategies, we construct diverse and valid abstract reasoning instruction data to help improving the general abstract reasoning ability of LLMs for multiple datasets. Experimental results show that, by supervised fine-tuning Qwen-2.5-7B on our synthesized instruction data, the resulting model shows improved abstract reasoning ability and outperforms various strong baseline LLMs, including closed-source model GPT-4 and open-source models such as LLaMA-3 and Qwen-2.5. We release the logs by GPT and our model at https://github.com/szu-tera/ARC."
Anthology ID:
2025.ccl-1.69
Volume:
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Month:
August
Year:
2025
Address:
Jinan, China
Editors:
Maosong Sun, Peiyong Duan, Zhiyuan Liu, Ruifeng Xu, Weiwei Sun
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
904–921
Language:
URL:
https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.69/
DOI:
Bibkey:
Cite (ACL):
Yile Wang and Hui Huang. 2025. Improving Abstract Reasoning Ability of Large Language Models through Mixture Program-based Data Synthesis. In Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025), pages 904–921, Jinan, China. Chinese Information Processing Society of China.
Cite (Informal):
Improving Abstract Reasoning Ability of Large Language Models through Mixture Program-based Data Synthesis (Wang & Huang, CCL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.69.pdf