Interactive Learning for LLM Reasoning

Hehai Lin; Shilei Cao; Sudong Wang; Haotian Wu; Minzhi Li; Linyi Yang; Juepeng Zheng; Chengwei Qin

Interactive Learning for LLM Reasoning

Hehai Lin, Shilei Cao, Sudong Wang, Haotian Wu, Minzhi Li, Linyi Yang, Juepeng Zheng, Chengwei Qin

Abstract

Existing multi-agent learning approaches explicitly foster collaboration among Large Language Models (LLMs) to build stronger multi-agent systems (MAS), yet they still rely on re-executing the MAS during inference. This contrasts with human cognition, wherein individuals can internalize insights from interactions to improve later independent reasoning. To investigate whether multi-agent interaction can enhance LLMs’ independent problem-solving ability, we propose ILR (Interactive Learning for LLM Reasoning), a co-learning framework that integrates Dynamic Interaction and Perception Calibration. Dynamic Interaction adaptively selects cooperative or competitive strategies based on question difficulty and model capability, after which LLMs exchange information via Idea3 framework (Idea Sharing, Idea Analysis, and Idea Fusion), an interaction paradigm simulating human discussion, before producing final answers. Perception Calibration employs Group Relative Policy Optimization (GRPO) while integrating one LLM’s reward characteristics into another’s to strengthen interaction cohesion. We evaluate the effectiveness of ILR across three LLMs from two model families of varying scales on five mathematical and one coding benchmarks. We further investigate the advantage of Dynamic Interaction (i.e., boosting the robustness of stronger LLMs and surpassing pure strategy), and the scalability of ILR beyond two-model interactions.

Anthology ID:: 2026.findings-acl.303
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6085–6108
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.303/
DOI:
Bibkey:
Cite (ACL):: Hehai Lin, Shilei Cao, Sudong Wang, Haotian Wu, Minzhi Li, Linyi Yang, Juepeng Zheng, and Chengwei Qin. 2026. Interactive Learning for LLM Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6085–6108, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Interactive Learning for LLM Reasoning (Lin et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.303.pdf
Checklist:: 2026.findings-acl.303.checklist.pdf

PDF Cite Search Checklist Fix data