Jingdi Lei
2026
LiveCANNBench: Benchmark SWE AI Coding for Ascend CANN
Sijie Wang | Kai Zhao | Wee Peng Tay | Shuo Zhang | Chengwen Liu | Quanjiang Guo | Ren Junhao | Xin Li | Heng Lian | Jingdi Lei | Rui She | Huacan Wang | Ronghao Chen
Findings of the Association for Computational Linguistics: ACL 2026
Sijie Wang | Kai Zhao | Wee Peng Tay | Shuo Zhang | Chengwen Liu | Quanjiang Guo | Ren Junhao | Xin Li | Heng Lian | Jingdi Lei | Rui She | Huacan Wang | Ronghao Chen
Findings of the Association for Computational Linguistics: ACL 2026
AI coding has emerged as a core application of large language models (LLMs), evolving from single-file coding tasks towards complex software engineering (SWE) scenarios. Recent advances in agents have enabled multi-file, multi-language, and dependency-aware AI coding, significantly expanding the scope of AI-assisted software development. While a variety of benchmarks have been proposed to evaluate coding capabilities in general-purpose or GPU coding ecosystems such as CUDA and ROCm, systematic evaluation for Huawei Ascend CANN remains largely underexplored. In this work, we propose LiveCANNBench, an SWE-level benchmark designed for AI coding in the CANN software stack. LiveCANNBench is constructed from real-world CANN repositories and consists of over 400 task instances spanning multi-file, multi-language, and execution-aware coding challenges. Unlike existing static benchmarks that primarily focus on kernel-level code generation, LiveCANNBench adopts a live benchmarking paradigm, effectively mitigating data leakage and enabling more reliable evaluation of modern coding agents.
2025
LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search
Di Zhang | Jianbo Wu | Jingdi Lei | Tong Che | Jiatong Li | Tong Xie | Xiaoshui Huang | Shufei Zhang | Marco Pavone | Yuqiang Li | Wanli Ouyang | Dongzhan Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Di Zhang | Jianbo Wu | Jingdi Lei | Tong Che | Jiatong Li | Tong Xie | Xiaoshui Huang | Shufei Zhang | Marco Pavone | Yuqiang Li | Wanli Ouyang | Dongzhan Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
This paper presents LLaMA-Berry, an advanced mathematical reasoning framework to enhance the problem-solving ability of large language models (LLMs). The framework combines Monte Carlo Tree Search with Self-Refine (SR-MCTS) to optimize the reasoning paths and utilizes a pairwise reward model to evaluate different paths globally. By leveraging the self-critique and rewriting capabilities of LLMs, our SR-MCTS overcomes the inefficiencies and limitations of conventional step-wise and greedy search algorithms, enabling a more efficient exploration of solution spaces. To guide the search process, we propose the Pairwise Preference Reward Model (PPRM), which predicts pairwise preferences between solutions through instruction-following capabilities trained by Reinforcement Learning from Human Feedback (RLHF). Finally, the Enhanced Borda Count (EBC) method is adopted to synthesize pairwise preferences into global quantile scores for evaluations. This approach mitigates the challenges of scoring variability and non-independent distributions in mathematical reasoning tasks. The framework has been tested on general and advanced benchmarks, showing superior search efficiency and performance compared to existing open-source and closed-source methods, particularly in complex Olympiad-level benchmarks, including AIME24 and AMC23.