Jintao Chen
Other people with similar names: Jintao Chen
2026
GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification
Wangjie Gan | Miao Pan | Linbo Xi | Wenqi Zhang | Jintao Chen | Jianwei Yin | Xuhong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Wangjie Gan | Miao Pan | Linbo Xi | Wenqi Zhang | Jintao Chen | Jianwei Yin | Xuhong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a special case of policy gradient optimization with an extremely sparse implicit reward and unstable inverse-probability weighting, which together lead to single-path dependency, entropy collapse, and gradient explosion. Motivated by this diagnosis, we propose Group Fine-Tuning (GFT), a unified post-training framework that addresses these intrinsic limitations through two mechanisms: Group Advantage Learning, which constructs diverse response groups and derives normalized contrastive supervision to alleviate reward sparsity, and Dynamic Coefficient Rectification, which adaptively bounds inverse-probability weights to stabilize optimization while preserving efficient knowledge injection. Experiments demonstrate that GFT consistently surpasses SFT-based methods and yields policies that integrate more smoothly with subsequent RL training.Our code is publicly available athttps://github.com/ZJU-OmniAI/GFT.
ToolGate: Contract-Grounded and Verified Tool Execution for LLMs
Yanming Liu | Xinyue Peng | Jiannan Cao | Xinyi Wang | Songhang Deng | Jintao Chen | Jianwei Yin | Xuhong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Yanming Liu | Xinyue Peng | Jiannan Cao | Xinyi Wang | Songhang Deng | Jintao Chen | Jianwei Yin | Xuhong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) augmented with external tools have demonstrated remarkable capabilities in complex reasoning tasks. However, existing frameworks rely heavily on natural language reasoning to determine when tools can be invoked and whether their results should be committed, lacking formal guarantees for logical safety and verifiability. We present ToolGate, a forward execution framework that provides logical safety guarantees and verifiable state evolution for LLM tool calling. ToolGate maintains an explicit symbolic state space as a typed key-value mapping representing trusted world information throughout the reasoning process. Each tool is formalized as a Hoare-style contract consisting of a precondition and a postcondition, where the precondition gates tool invocation by checking whether the current state satisfies the required conditions, and the postcondition determines whether the tool’s result can be committed to update the state through runtime verification. Our approach guarantees that the symbolic state evolves only through verified tool executions, preventing invalid or hallucinated results from corrupting the world representation. Experimental validation demonstrates that ToolGate significantly improves the reliability and verifiability of tool-augmented LLM systems while maintaining competitive performance on complex multi-step reasoning tasks. This work establishes a foundation for building more trustworthy and debuggable AI systems that integrate language models with external tools.