Zhenhao Zhou
Also published as: ZhenHao Zhou
2026
Taming System Complexity: Demystifying Software Engineering Agents in Diagnosing Linux Kernel Faults
Zhenhao Zhou | Zhuochen Huang | Yike He | Chong Wang | Jiajun Wang | Yijian Wu | Xin Peng | Yiling Lou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhenhao Zhou | Zhuochen Huang | Yike He | Chong Wang | Jiajun Wang | Yijian Wu | Xin Peng | Yiling Lou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The Linux kernel is a critical system, serving as the foundation for numerous systems. Bugs in the Linux kernel can cause serious consequences, affecting billions of users. Fault localization (FL), which aims at identifying the buggy code elements in software, plays an essential role in software quality assurance. While recent LLM agents have achieved promising accuracy in FL on recent benchmarks like SWE-bench, it remains unclear how well these methods perform in the Linux kernel, where FL is much more challenging due to the large-scale code base, limited observability, and diverse impact factors. In this paper, we introduce LinuxFLBench, a FL benchmark constructed from real-world Linux kernel bugs. We conduct an empirical study to assess the performance of state-of-the-art LLM agents on the Linux kernel. Our initial results reveal that existing agents struggle with this task, achieving a best top-1 accuracy of only 41.6% at file level. To address this challenge, we propose LinuxFL+, an enhancement framework designed to improve FL effectiveness of LLM agents for the Linux kernel. LinuxFL+ substantially improves the FL accuracy of all studied agents (e.g., 7.2% - 11.2% accuracy increase) with minimal costs.
2025
ICLEval: Evaluating In-Context Learning Ability of Large Language Models
Wentong Chen | Yankai Lin | ZhenHao Zhou | HongYun Huang | YanTao Jia | Zhao Cao | Ji-Rong Wen
Proceedings of the 31st International Conference on Computational Linguistics
Wentong Chen | Yankai Lin | ZhenHao Zhou | HongYun Huang | YanTao Jia | Zhao Cao | Ji-Rong Wen
Proceedings of the 31st International Conference on Computational Linguistics
In-Context Learning (ICL) is a critical capability of Large Language Models (LLMs) as it empowers them to comprehend and reason across interconnected inputs. Evaluating the ICL ability of LLMs can enhance their utilization and deepen our understanding of how this ability is acquired at the training stage. However, existing evaluation frameworks primarily focus on language abilities and knowledge, often overlooking the assessment of ICL ability. In this work, we introduce the ICLEval benchmark to evaluate the ICL abilities of LLMs, which encompasses two key sub-abilities: exact copying and rule learning. Through the ICLEval benchmark, we demonstrate that ICL ability is universally present in different LLMs, and model size is not the sole determinant of ICL efficacy. Surprisingly, we observe that ICL abilities, particularly copying, develop early in the pretraining process and stabilize afterward.