Sudong Wang

2026

Existing multi-agent learning approaches explicitly foster collaboration among Large Language Models (LLMs) to build stronger multi-agent systems (MAS), yet they still rely on re-executing the MAS during inference. This contrasts with human cognition, wherein individuals can internalize insights from interactions to improve later independent reasoning. To investigate whether multi-agent interaction can enhance LLMs’ independent problem-solving ability, we propose ILR (Interactive Learning for LLM Reasoning), a co-learning framework that integrates Dynamic Interaction and Perception Calibration. Dynamic Interaction adaptively selects cooperative or competitive strategies based on question difficulty and model capability, after which LLMs exchange information via Idea3 framework (Idea Sharing, Idea Analysis, and Idea Fusion), an interaction paradigm simulating human discussion, before producing final answers. Perception Calibration employs Group Relative Policy Optimization (GRPO) while integrating one LLM’s reward characteristics into another’s to strengthen interaction cohesion. We evaluate the effectiveness of ILR across three LLMs from two model families of varying scales on five mathematical and one coding benchmarks. We further investigate the advantage of Dynamic Interaction (i.e., boosting the robustness of stronger LLMs and surpassing pure strategy), and the scalability of ILR beyond two-model interactions.

pdf bib abs

The rapid evolution of Large Language Model (LLM) agents has necessitated robust memory systems to support cohesive long-term interaction and complex reasoning. Benefiting from the strong capabilities of LLMs, recent research focus has shifted from simple context extension to the development of dedicated agentic memory systems. However, existing approaches typically rely on rigid retrieval granularity, accumulation-heavy maintenance strategies, and coarse-grained update mechanisms. These design choices create a persistent mismatch between stored information and task-specific reasoning demands, while leading to the unchecked accumulation of logical inconsistencies over time. To address these challenges, we propose Adaptive Memory via Multi-Agent Collaboration (AMA), a novel framework that leverages coordinated agents to manage memory across multiple granularities. AMA employs a hierarchical memory design that dynamically aligns retrieval granularity with task complexity. Specifically, the Constructor and Retriever jointly enable multi-granularity memory construction and adaptive query routing. The Judge verifies the relevance and consistency of retrieved content, triggering iterative retrieval when evidence is insufficient or invoking the Refresher upon detecting logical conflicts. The Refresher then enforces memory consistency by performing targeted updates or removing outdated entries. Extensive experiments on challenging long-context benchmarks show that AMA significantly outperforms state-of-the-art baselines while reducing token consumption by approximately 80% compared to full-context methods, demonstrating its effectiveness in maintaining retrieval precision and long-term memory consistency.

Co-authors

Qian Li 1

Bo Xu 1

Venues

Findings2

Fix author