Shukai Liu
2026
LoopCoder: Scaling Code Intelligence via Looped Language Models
Jian Yang | Wei Zhang | Shuyue Guo | Yizhi LI | Linzheng Chai | Zhengmao Ye | Shukai Liu | Yuyang Song | Jiajun Wu | Che Liu | Tianyu Zheng | Siwei Wu | Leo L | Xudong Ma | Chuan Hao | Ran Tao | Yan Xing | Jianzhou Wang | Mingjie Tang | Aishan Liu | Zhoujun Li | Xianglong Liu | Weifeng Lv | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
Jian Yang | Wei Zhang | Shuyue Guo | Yizhi LI | Linzheng Chai | Zhengmao Ye | Shukai Liu | Yuyang Song | Jiajun Wu | Che Liu | Tianyu Zheng | Siwei Wu | Leo L | Xudong Ma | Chuan Hao | Ran Tao | Yan Xing | Jianzhou Wang | Mingjie Tang | Aishan Liu | Zhoujun Li | Xianglong Liu | Weifeng Lv | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
While large language models (LLMs) have mastered syntax-level code generation, complex algorithmic reasoning remains a challenge, typically addressed by scaling model depth and parameter count. Universal Transformers (UT) offer a compelling alternative by introducing a recurrent inductive bias that aligns with the recursive nature of programming logic. However, training looped architectures at scale has historically been hindered by severe instability and optimization difficulties associated with backpropagation through time (BPTT). We present LoopCoder (40B-A80B) pre-trained on 12T+ code and general tokens, along with LoopCoder-Thinking and LoopCoder-Instruct variants—the first large-scale looped transformer for code, achieving comparable performance to standard dense architectures with more parameters. Unlike prior approaches that restrict recurrence to small-scale tasks, we implement a comprehensive looped training protocol spanning both pre-training and post-training phases. We initiate the model via dense-to-loop transformation, folding a pre-trained dense checkpoint to initialize a recurrent block, followed by rigorous looped pre-training and specialized post-training for instruction following and reasoning. Our results establish a robust recipe for scaling coding intelligence via recurrent computation, proving that dense checkpoints serve as an optimal foundation for evolving into dynamic, looped reasoners.
MdEval: Massively Multilingual Code Debugging
Shukai Liu | Linzheng Chai | Jian Yang | Jiajun Shi | He Zhu | Liran Wang | Jin Ke | Wei Zhang | Hualei Zhu | Shuyue Guo | Tao Sun | Jiaheng Liu | Yunlong Duan | Yu Hao | Liqun Yang | Guanglin Niu | Ge Zhang | Zhoujun Li
Findings of the Association for Computational Linguistics: ACL 2026
Shukai Liu | Linzheng Chai | Jian Yang | Jiajun Shi | He Zhu | Liran Wang | Jin Ke | Wei Zhang | Hualei Zhu | Shuyue Guo | Tao Sun | Jiaheng Liu | Yunlong Duan | Yu Hao | Liqun Yang | Guanglin Niu | Ge Zhang | Zhoujun Li
Findings of the Association for Computational Linguistics: ACL 2026
Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippets and their associated test cases, are used to assess the debugging capabilities of LLMs. However, many existing benchmarks primarily focus on Python and are often limited in terms of language diversity (e.g., DebugBench and DebugEval). To advancethe field of multilingual debugging with LLMs, we propose the first massively multilingual debugging benchmark, which includes 3.9K test samples of 20 programming languages and covers the automated program repair (APR) task, the bug localization(BL) task, and the bug identification (BI) task. In addition, we introduce the debugging instruction corpora MdEval-Instruct by injecting bugs into the correct multilingual queries and solutions (xDebugGen). Further, a multilingual debugger xDebugCoder trained on MdEval-Instruct as a strong baseline specifically to handle bugs of a wide range of programming languages (e.g. “Missing Mut” in language Rust and “Misused Macro Definition” in language C). Our extensive experiments on MdEval reveal a notable performance gap between open-source and closed-source LLMs (e.g., GPT and Claudeseries), highlighting huge room for improvement in multilingual code debugging scenarios.
Context as a Tool: Context Management for Long-Horizon SWE-Agents
Shukai Liu | Bo Jiang | Jian Yang | Yizhi LI | Jinyang Guo | Xianglong Liu | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
Shukai Liu | Bo Jiang | Jian Yang | Yizhi LI | Jinyang Guo | Xianglong Liu | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
Agents based on large language models have recently shown strong potential on real-world software engineering (SWE) tasks that require long-horizon interaction with repository-scale codebases. However, most existing agents rely on append-only context maintenance or passively triggered compression heuristics, which often lead to context explosion, semantic drift, and degraded reasoning in long-running interactions. We propose Cat, a new context management paradigm that elevates context maintenance to a callable tool integrated into the decision-making process of agents. Cat formalizes a structured context workspace consisting of stable task semantics, condensed long-term memory, and high-fidelity short-term interactions, and enables agents to proactively compress historical trajectories into actionable summaries at appropriate milestones. To support context management for SWE-agents, we propose a trajectory-level supervision framework, CaT-Generator, based on an offline data construction pipeline that injects context-management actions into complete interaction trajectories. Using this framework, we train a context-aware model, SWE-Compressor. Experiments on SWE-Bench-Verified demonstrate that SWE-Compressor reaches a 57.6% solved rate and significantly outperforms ReAct-based agents and static compression baselines, while maintaining stable and scalable long-horizon reasoning under a bounded context budget.
2025
M2RC-EVAL: Massively Multilingual Repository-level Code Completion Evaluation
Jiaheng Liu | Ken Deng | Congnan Liu | Jian Yang | Shukai Liu | He Zhu | Peng Zhao | Linzheng Chai | Yanan Wu | JinKe JinKe | Ge Zhang | Zekun Moore Wang | Guoan Zhang | Yingshui Tan | Bangyu Xiang | Zhaoxiang Zhang | Wenbo Su | Bo Zheng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiaheng Liu | Ken Deng | Congnan Liu | Jian Yang | Shukai Liu | He Zhu | Peng Zhao | Linzheng Chai | Yanan Wu | JinKe JinKe | Ge Zhang | Zekun Moore Wang | Guoan Zhang | Yingshui Tan | Bangyu Xiang | Zhaoxiang Zhang | Wenbo Su | Bo Zheng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Repository-level code completion has drawn great attention in software engineering, and several benchmarks have been introduced. However, existing repository-level code completion benchmarks usually focus on a limited number of languages (<5), which cannot evaluate the general code intelligence abilities across different languages for existing code Large Language Models (LLMs). Besides, the existing benchmarks usually report overall average scores of different languages, where the fine-grained abilities in different completion scenarios are ignored. Therefore, to facilitate the research of code LLMs in multilingual scenarios, we propose a massively multilingual repository-level code completion benchmark covering 18 programming languages (called M2RC-EVAL), and two types of fine-grained annotations (i.e., bucket-level and semantic-level) on different completion scenarios are provided, where we obtain these annotations based on the parsed abstract syntax tree. Moreover, we also curate a massively multilingual instruction corpora M2RC-INSTRUCT dataset to improve the repository-level code completion abilities of existing code LLMs. Comprehensive experimental results demonstrate the effectiveness of our M2RC-EVAL and M2RC-INSTRUCT.
Search
Fix author
Co-authors
- Jian Yang 4
- Linzheng Chai 3
- Bryan Dai 2
- Shuyue Guo 2
- Yizhi Li 2
- Zhoujun Li 2
- Jiaheng Liu 2
- Xianglong Liu 2
- Ge Zhang 2
- Wei Zhang 2
- He Zhu 2
- Ken Deng 1
- Yunlong Duan 1
- Jinyang Guo 1
- Chuan Hao 1
- Yu Hao 1
- Bo Jiang 1
- JinKe JinKe 1
- Jin Ke 1
- Leo L 1
- Aishan Liu 1
- Che Liu 1
- Congnan Liu 1
- Weifeng Lv 1
- Xudong Ma 1
- Guanglin Niu 1
- Jiajun Shi 1
- Yuyang Song 1
- Wenbo Su 1
- Tao Sun 1
- Yingshui Tan 1
- Mingjie Tang 1
- Ran Tao 1
- Jianzhou Wang 1
- Liran Wang 1
- Zekun Moore Wang 1
- Jiajun Wu 1
- Siwei Wu 1
- Yanan Wu 1
- Bangyu Xiang 1
- Yan Xing 1
- Liqun Yang 1
- Zhengmao Ye 1
- Guoan Zhang 1
- Zhaoxiang Zhang 1
- Peng Zhao 1
- Bo Zheng 1
- Tianyu Zheng 1
- Hualei Zhu 1