Jianbin Jiao


2026

Temporal Knowledge Graph (TKG) reasoning remains challenging to characterize with conventional flat representations due to its intrinsic heterogeneous structure. Existing multi-geometry approaches face two key bottlenecks: 1) the Riemannian depth barrier driven by numerical instability, which restricts models to shallow architectures; and 2) gate collapse, where adaptive fusion mechanisms suffer from gradient starvation and degenerate into single-geometry solutions. To this end, we propose MAGIC (Multi-geometry Annealing Graph Interaction with Consensus). Our framework introduces a Tangent-Residual Engine in multi-geometric spaces, which enables the first stable 8-layer geometric evolution and reveals a phenomenon termed Geometric Annealing, where manifold curvature spontaneously evolves from semantic flatness in shallow layers to structural complexity in deeper layers. We further design an explicit reasoning module with structural consensus, leveraging geometric invariants and structural priors to regulate gradient flow, prevent collapse, and ensure robust synergy across Hyperbolic, Spherical, and Euclidean spaces. Experiments show that MAGIC achieves state-of-the-art performance in TKG reasoning, improving MRR by up to 2.9 points.
Existing LLM-based agents primarily utilize coarse-grained experiential memory, where experiences are retrieved based on global task or scene context. While effective in simple settings, such coarse-grained memory lacks the situational alignment required for complex multi-step decision-making. As a result, recalled experiences often fail to match the agent’s current state, blurring reasoning focus and leading to inaccurate decisions at critical steps. To this end, we propose State-Aware memory(SAMem), a new fine-grained memory paradigm for LLM agents that explicitly aligns memory retrieval with the current state. Instead of storing and reusing globally shared experiences, SAMem organizes memory at the level of state-specific reasoning thoughts, enabling the agent to retrieve only the most relevant experience for the current decision context. This state-conditioned memory allows the agent to focus on the most informative reasoning cues at each step, rather than being distracted by task-level but state-misaligned guidance. Extensive experiments on complex decision-making benchmarks demonstrate that SAMem outperforms existing experiential memory approaches, achieving superior performance and substantially improved task-solving efficiency. These results indicate that state-aware, fine-grained memory enhances the decision-making capabilities of LLM agents.

2025

Large Language Models (LLMs) have shown impressive reasoning capabilities in well-defined problems with clear solutions, such as mathematics and coding. However, they still struggle with complex real-world scenarios like business negotiations, which require strategic reasoning—an ability to navigate dynamic environments and align long-term goals amidst uncertainty.Existing methods for strategic reasoning face challenges in adaptability, scalability, and transferring strategies to new contexts.To address these issues, we propose explicit policy optimization (*EPO*) for strategic reasoning, featuring an LLM that provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior.To improve adaptability and policy transferability, we train the strategic reasoning model via multi-turn reinforcement learning (RL), utilizing process rewards and iterative self-play.Experiments across social and physical domains demonstrate *EPO*’s ability of long-term goal alignment through enhanced strategic reasoning, achieving state-of-the-art performance on social dialogue and web navigation tasks. Our findings reveal various collaborative reasoning mechanisms emergent in *EPO* and its effectiveness in generating novel strategies, underscoring its potential for strategic reasoning in real-world applications. Code and data are available at [https://github.com/lxqpku/EPO](https://github.com/lxqpku/EPO).