Hierarchical Reward Modeling for Fault Localization in Large Code Repositories
Jiwei Zhang, Jianxun Lian, Haiming Qin, Mingyang Zhou, KeZhong Lu, Rui Mao, Hao Liao
Abstract
Large Language Models (LLMs) exhibit significant potential in complex software engineering tasks, however, their fault localization capabilities within repository are constrained by inherent limitations in max context length. Although Test-Time Scaling (TTS) can generate multiple candidate solutions, traditional selection strategies often fail to identify the optimal one. To solve this problem, we introduces Hierarchical Localization Reward Model (HiLoRM), which specifically designed to evaluate and select the most accurate fault localization candidates (at file, function, and line levels) from the multiple sampled outputs of LLMs, thereby enhancing localization accuracy. Furthermore, we constructed the HiFL-44k dataset, comprising approximately 44,000 fault localization instances, to train HiLoRM. Experimental results demonstrate that on the SWE-Bench-Lite dataset, HiLoRM improves the final line-level localization recall by 12% compared to a baseline model that does not use a reward model. Concurrently, HiLoRM exhibits a strong capability to evaluate predictions from larger LLMs (e.g., 32B parameters) and demonstrates transferability and generalization potential when applied to other fault localization methods. This work provides an effective methodology and an accessible model to significantly improve the accuracy and reliability of LLMs for repository-level fault localization. Our codes and datasets are available at https://github.com/SZU-ZJW/HiFL-Method.- Anthology ID:
- 2025.findings-emnlp.966
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 17782–17796
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.966/
- DOI:
- 10.18653/v1/2025.findings-emnlp.966
- Cite (ACL):
- Jiwei Zhang, Jianxun Lian, Haiming Qin, Mingyang Zhou, KeZhong Lu, Rui Mao, and Hao Liao. 2025. Hierarchical Reward Modeling for Fault Localization in Large Code Repositories. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 17782–17796, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Hierarchical Reward Modeling for Fault Localization in Large Code Repositories (Zhang et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.966.pdf