Xin Guo
Other people with similar names: Xin Guo
Unverified author pages with similar names: Xin Guo
2026
Counteracting the Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing
Xin Guo | Zhiheng Xi | Yiwen Ding | Yitao Zhai | Xiaowei Shi | Xunliang Cai | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xin Guo | Zhiheng Xi | Yiwen Ding | Yitao Zhai | Xiaowei Shi | Xunliang Cai | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Self-improvement has emerged as a mainstream paradigm for advancing the reasoning capabilities of large vision–language models (LVLMs), where models explore and learn from successful trajectories iteratively. However, we identify a critical imbalance during this process: the model readily generates high-quality trajectories for simple queries (i.e., head data) but struggles with complex ones (i.e., tail data). This bias drives the optimization to disproportionately prioritize simple reasoning skills, while inhibiting the acquisition of complex capabilities. As iterations progress, this imbalance becomes more acute—a dynamic we term the "Matthew effect", ultimately stalling performance gains. To mitigate this, we approach head-tail re-balance during the exploration-and-learning process from two perspectives: distribution-reshaping and trajectory-resampling. Extensive experiments on Qwen2-VL-7B-Instruct and InternVL2.5-4B models across visual reasoning tasks demonstrate that our methods consistently improve visual reasoning capabilities, outperforming vanilla self-improvement baselines by an average of 3.86 points.
AgentGym2: Benchmarking Large Language Model Agents in De-Idealized Real-World Environments
Zhiheng Xi | Dingwen Yang | Jiaqi Liu | Jixuan Huang | Honglin Guo | Baodai Huang | Tinggang Chen | Qi Zhang | Zhonghang Lu | Chenyu Liu | Jiajun Sun | Jiazheng Zhang | Dingwei Zhu | Xin Guo | Junzhe Wang | Zhihao Zhang | Yuming Yang | Junjie Ye | Minghe Gao | Dongrui Liu | Jiaming Ji | Guohao Li | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiheng Xi | Dingwen Yang | Jiaqi Liu | Jixuan Huang | Honglin Guo | Baodai Huang | Tinggang Chen | Qi Zhang | Zhonghang Lu | Chenyu Liu | Jiajun Sun | Jiazheng Zhang | Dingwei Zhu | Xin Guo | Junzhe Wang | Zhihao Zhang | Yuming Yang | Junjie Ye | Minghe Gao | Dongrui Liu | Jiaming Ji | Guohao Li | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language agents, i.e., LLM agents, progress rapidly and are increasingly deployed in production environments. This trend underscores the urgent need for rigorous and realistic evaluations. However, most existing benchmarks evaluate agents in simplified, idealized settings. They typically rely on pre-packaged tool interfaces, overlook critical steps, and assume inputs are clean and fully specified. Consequently, they understate the difficulty of real deployments, where uncertainty and noise are ubiquitous and agents must proactively explore the environment to uncover new tools. To bridge this gap, we present AgentGym2, a new evaluation framework with task instances grounded in real-world end-to-end working demands. Beyond reasoning and planning, it measures agents’ ability to execute end-to-end procedures, discover tools via exploration, compose tools for unseen tasks, and remain robust to noisy and underspecified information. Experiments on 15 proprietary and open-source models show that even SOTA systems like Gemini and GPT-5 struggle on AgentGym2, revealing a substantial gap between the capability of current agents and the demands of real-world applications.
Search
Fix author
Co-authors
- Tao Gui 2
- Xuan-Jing Huang (黄萱菁) 2
- Zhiheng Xi 2
- Qi Zhang 2
- Xunliang Cai 1
- Tinggang Chen 1
- Yiwen Ding 1
- Minghe Gao 1
- Honglin Guo 1
- Jixuan Huang 1
- Baodai Huang 1
- Jiaming Ji 1
- Guohao Li 1
- Jiaqi Liu 1
- Chenyu Liu 1
- Dongrui Liu 1
- Zhonghang Lu 1
- Xiaowei Shi 1
- Jiajun Sun 1
- Junzhe Wang 1
- Dingwen Yang 1
- Yuming Yang 1
- Junjie Ye (叶俊杰) 1
- Yitao Zhai 1
- Qi Zhang 1
- Jiazheng Zhang 1
- Zhihao Zhang 1
- Dingwei Zhu 1
Venues
- ACL2