Jianbin Jiao
2026
MAGIC: Deep Geometric Evolution with Structural Consensus for Temporal Knowledge Graph Reasoning
Chengao Liu | Yuan Li | Yingze Wang | Jianbin Jiao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chengao Liu | Yuan Li | Yingze Wang | Jianbin Jiao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Temporal Knowledge Graph (TKG) reasoning remains challenging to characterize with conventional flat representations due to its intrinsic heterogeneous structure. Existing multi-geometry approaches face two key bottlenecks: 1) the Riemannian depth barrier driven by numerical instability, which restricts models to shallow architectures; and 2) gate collapse, where adaptive fusion mechanisms suffer from gradient starvation and degenerate into single-geometry solutions. To this end, we propose MAGIC (Multi-geometry Annealing Graph Interaction with Consensus). Our framework introduces a Tangent-Residual Engine in multi-geometric spaces, which enables the first stable 8-layer geometric evolution and reveals a phenomenon termed Geometric Annealing, where manifold curvature spontaneously evolves from semantic flatness in shallow layers to structural complexity in deeper layers. We further design an explicit reasoning module with structural consensus, leveraging geometric invariants and structural priors to regulate gradient flow, prevent collapse, and ensure robust synergy across Hyperbolic, Spherical, and Euclidean spaces. Experiments show that MAGIC achieves state-of-the-art performance in TKG reasoning, improving MRR by up to 2.9 points.
SAMem: State-Aware Memory as a Fine-Grained Memory for LLM Agents in Decision-Making
Tong Wang | Pei Xu | Shiyue Cao | Likun Yang | Daipeng Li | Jianbin Jiao | Kaiqi Huang
Findings of the Association for Computational Linguistics: ACL 2026
Tong Wang | Pei Xu | Shiyue Cao | Likun Yang | Daipeng Li | Jianbin Jiao | Kaiqi Huang
Findings of the Association for Computational Linguistics: ACL 2026
Existing LLM-based agents primarily utilize coarse-grained experiential memory, where experiences are retrieved based on global task or scene context. While effective in simple settings, such coarse-grained memory lacks the situational alignment required for complex multi-step decision-making. As a result, recalled experiences often fail to match the agent’s current state, blurring reasoning focus and leading to inaccurate decisions at critical steps. To this end, we propose State-Aware memory(SAMem), a new fine-grained memory paradigm for LLM agents that explicitly aligns memory retrieval with the current state. Instead of storing and reusing globally shared experiences, SAMem organizes memory at the level of state-specific reasoning thoughts, enabling the agent to retrieve only the most relevant experience for the current decision context. This state-conditioned memory allows the agent to focus on the most informative reasoning cues at each step, rather than being distracted by task-level but state-misaligned guidance. Extensive experiments on complex decision-making benchmarks demonstrate that SAMem outperforms existing experiential memory approaches, achieving superior performance and substantially improved task-solving efficiency. These results indicate that state-aware, fine-grained memory enhances the decision-making capabilities of LLM agents.
2025
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Xiaoqian Liu | Ke Wang | Yongbin Li | Yuchuan Wu | Wentao Ma | Aobo Kong | Fei Huang | Jianbin Jiao | Junge Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiaoqian Liu | Ke Wang | Yongbin Li | Yuchuan Wu | Wentao Ma | Aobo Kong | Fei Huang | Jianbin Jiao | Junge Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have shown impressive reasoning capabilities in well-defined problems with clear solutions, such as mathematics and coding. However, they still struggle with complex real-world scenarios like business negotiations, which require strategic reasoning—an ability to navigate dynamic environments and align long-term goals amidst uncertainty.Existing methods for strategic reasoning face challenges in adaptability, scalability, and transferring strategies to new contexts.To address these issues, we propose explicit policy optimization (*EPO*) for strategic reasoning, featuring an LLM that provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior.To improve adaptability and policy transferability, we train the strategic reasoning model via multi-turn reinforcement learning (RL), utilizing process rewards and iterative self-play.Experiments across social and physical domains demonstrate *EPO*’s ability of long-term goal alignment through enhanced strategic reasoning, achieving state-of-the-art performance on social dialogue and web navigation tasks. Our findings reveal various collaborative reasoning mechanisms emergent in *EPO* and its effectiveness in generating novel strategies, underscoring its potential for strategic reasoning in real-world applications. Code and data are available at [https://github.com/lxqpku/EPO](https://github.com/lxqpku/EPO).