Jifan Lin
2026
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces
Yukang Feng | Jianwen Sun | Zelai Yang | Jiaxin Ai | Chuanhao Li | Zizhen Li | Fanrui Zhang | Kang He | Rui Ma | Jifan Lin | Jie Sun | Yang Xiao | Sizhuo Zhou | Wenxiao Wu | Yiming Liu | Pengfei Liu | Shenglin Zhang | Kaipeng Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Yukang Feng | Jianwen Sun | Zelai Yang | Jiaxin Ai | Chuanhao Li | Zizhen Li | Fanrui Zhang | Kang He | Rui Ma | Jifan Lin | Jie Sun | Yang Xiao | Sizhuo Zhou | Wenxiao Wu | Yiming Liu | Pengfei Liu | Shenglin Zhang | Kaipeng Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Recent advances in AI-assisted programming have empowered agents to execute complex workflows via command-line interfaces, however, existing benchmarks are limited by short task horizons, data contamination from GitHub scraping, and a lack of fine-grained evaluation metrics, fail to rigorously evaluate the long-horizon planning and execution capabilities essential for realistic software engineering. To address these gaps, we introduce LongCLI-Bench, a comprehensive benchmark designed to evaluate agentic capabilities across long-horizon, realistic, sequential engineering tasks. We curated 20 high-quality, long-horizon tasks from over 1,000 computer science assignments and real-world workflows, covering four engineering categories: from scratch, feature addition, bug fixing, and refactoring. LongCLI-Bench employs a dual-set testing protocol, which measures requirement fulfillment (fail(→)pass) and regression avoidance (pass(→)pass), and incorporates step-level scoring to pinpoint execution failures. Extensive experiments reveal that even state-of-the-art agents achieve pass rates below 20% in LongCLI-Bench. Step-level analysis further indicates that the majority of tasks stall at less than 30% completion, highlighting that critical failures often occur in the early stages. Although self-correction offers marginal gains, human-agent collaboration through plan injection and interactive guidance yields significantly higher improvements. These results highlight that future research must emphasize the development of synergistic human-agent workflows alongside advances in agents’ planning and execution capabilities to overcome key challenges in long-horizon task performance.
2024
OpenResearcher: Unleashing AI for Accelerated Scientific Research
Yuxiang Zheng | Shichao Sun | Lin Qiu | Dongyu Ru | Cheng Jiayang | Xuefeng Li | Jifan Lin | Binjie Wang | Yun Luo | Renjie Pan | Yang Xu | Qingkai Min | Zizhao Zhang | Yiwen Wang | Wenjie Li | Pengfei Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Yuxiang Zheng | Shichao Sun | Lin Qiu | Dongyu Ru | Cheng Jiayang | Xuefeng Li | Jifan Lin | Binjie Wang | Yun Luo | Renjie Pan | Yang Xu | Qingkai Min | Zizhao Zhang | Yiwen Wang | Wenjie Li | Pengfei Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
The rapid growth of scientific literature imposes significant challenges for researchers endeavoring to stay updated with the latest advancements in their fields and delve into new areas. We introduce OpenResearcher, an innovative platform that leverages Artificial Intelligence (AI) techniques to accelerate the research process by answering diverse questions from researchers. OpenResearcher is built based on Retrieval-Augmented Generation (RAG) to integrate Large Language Models (LLMs) with up-to-date, domain-specific knowledge. Moreover, we develop various tools for OpenResearcher to understand researchers’ queries, search from the scientific literature, filter retrieved information, provide accurate and comprehensive answers, and self-refine these answers. OpenResearcher can flexibly use these tools to balance efficiency and effectiveness. As a result, OpenResearcher enables researchers to save time and increase their potential to discover new insights and drive scientific breakthroughs. Demo, video, and code are available at: https://github.com/GAIR-NLP/OpenResearcher.
Search
Fix author
Co-authors
- Pengfei Liu 2
- Jiaxin Ai 1
- Yukang Feng 1
- Kang He 1
- Cheng Jiayang 1
- Xuefeng Li 1
- Wenjie Li 1
- Chuanhao Li 1
- Zizhen Li 1
- Yiming Liu 1
- Yun Luo 1
- Rui Ma 1
- Qingkai Min 1
- Renjie Pan 1
- Lin Qiu 1
- Dongyu Ru 1
- Shichao Sun 1
- Jianwen Sun 1
- Jie Sun 1
- Binjie Wang 1
- Yiwen Wang 1
- Wenxiao Wu 1
- Yang Xiao 1
- Yang Xu 1
- Zelai Yang 1
- Zizhao Zhang 1
- Fanrui Zhang 1
- Shenglin Zhang 1
- Kaipeng Zhang 1
- Yuxiang Zheng 1
- Sizhuo Zhou 1