Shilei Cao


2026

Existing multi-agent learning approaches explicitly foster collaboration among Large Language Models (LLMs) to build stronger multi-agent systems (MAS), yet they still rely on re-executing the MAS during inference. This contrasts with human cognition, wherein individuals can internalize insights from interactions to improve later independent reasoning. To investigate whether multi-agent interaction can enhance LLMs’ independent problem-solving ability, we propose ILR (Interactive Learning for LLM Reasoning), a co-learning framework that integrates Dynamic Interaction and Perception Calibration. Dynamic Interaction adaptively selects cooperative or competitive strategies based on question difficulty and model capability, after which LLMs exchange information via Idea3 framework (Idea Sharing, Idea Analysis, and Idea Fusion), an interaction paradigm simulating human discussion, before producing final answers. Perception Calibration employs Group Relative Policy Optimization (GRPO) while integrating one LLM’s reward characteristics into another’s to strengthen interaction cohesion. We evaluate the effectiveness of ILR across three LLMs from two model families of varying scales on five mathematical and one coding benchmarks. We further investigate the advantage of Dynamic Interaction (i.e., boosting the robustness of stronger LLMs and surpassing pure strategy), and the scalability of ILR beyond two-model interactions.