Shousheng Jia
2026
ARC: Active and Reflection-driven Context Management for Long-Horizon Information Seeking Agents
Yilun Yao | Shan Huang | Elsie Dai | Zhewen Tan | Zhenyu Duan | Shousheng Jia | Yanbing Jiang | Tong Yang
Findings of the Association for Computational Linguistics: ACL 2026
Yilun Yao | Shan Huang | Elsie Dai | Zhewen Tan | Zhenyu Duan | Shousheng Jia | Yanbing Jiang | Tong Yang
Findings of the Association for Computational Linguistics: ACL 2026
Large language models are increasingly deployed as research agents for deep search and long-horizon information seeking, yet their performance often degrades as interaction histories grow. This degradation, known as context rot, reflects a failure to maintain coherent and task-relevant internal states over extended reasoning horizons. Existing approaches primarily manage context through raw accumulation or passive summarization, treating it as a static artifact and allowing early errors or misplaced emphasis to persist. Motivated by this perspective, we propose ARC, which is the first framework to systematically formulate context management as an active, reflection-driven process that treats context as a dynamic internal reasoning state during execution. ARC operationalizes this view through reflection-driven monitoring and revision, allowing agents to actively reorganize their working context when misalignment or degradation is detected. Experiments on challenging long-horizon information-seeking benchmarks show that ARC consistently outperforms passive context compression methods, achieving up to an 11% absolute improvement in accuracy on BrowseComp-ZH with Qwen2.5-32B-Instruct.
2025
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
Liang Wen | Yunke Cai | Fenrui Xiao | Xin He | Qi An | Zhenyu Duan | Yimin Du | Junchen Liu | Lifu Tang | Xiaowei Lv | Haosheng Zou | Yongchao Deng | Shousheng Jia | Xiangzheng Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Liang Wen | Yunke Cai | Fenrui Xiao | Xin He | Qi An | Zhenyu Duan | Yimin Du | Junchen Liu | Lifu Tang | Xiaowei Lv | Haosheng Zou | Yongchao Deng | Shousheng Jia | Xiangzheng Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
This paper introduces Light-R1, an opensource suite for training long reasoning modelsusing reproducible and cost-effective methodology. Given the proprietary nature of data usedin the DeepSeek-R1 series, we develop an alternative approach leveraging exclusively publicdata and models. Our curriculum training progressively increases data difficulty, combinedwith multi-staged post-training. Our LightR1-32B model, trained from Qwen2.5-32BInstruct, outperforms DeepSeek-R1-DistillQwen-32B in math reasoning. Experimental results show that this curriculum approachbecomes more effective when distinct, diverse datasets are available for different training stages: fine-tuning DeepSeek-R1-Distilledmodels (pre-tuned by DeepSeek team on proprietary data) with 3,000 challenging examplesfrom our curriculum dataset yielded state-ofthe-art 7B and 14B models, while the 32Bmodel, Light-R1-32B-DS performed comparably to QwQ-32B and DeepSeek-R1. Furthermore, we extend our work by applying GRPOon long reasoning models. Our final Light-R1-14B-DS achieves SOTA performance among14B models in math, with AIME24 & 25 scoresof 74.0 and 60.2 respectively, surpassing many32B models and DeepSeek-R1-Distill-Llama70B. Despite math-focused training, Light-R1-14B-DS demonstrates strong cross-domain generalization. Light-R1 represents a significantadvancement in making sophisticated reasoning models more accessible and implementablein real-world applications. Our models, training data and code have been made available.