MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios

JinYang Huang; Xiachong Feng; Qiguang Chen; Hanjie Zhao; Zihui Cheng; Jiesong Bai; Jingxuan Zhou; Min Li; Libo Qin

MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios

JinYang Huang, Xiachong Feng, Qiguang Chen, Hanjie Zhao, Zihui Cheng, Jiesong Bai, Jingxuan Zhou, Min Li, Libo Qin

Abstract

Code debugging is a crucial task in software engineering, which attracts increasing attention. While remarkable success has been made in the era of large language models (LLMs), current research still focuses on the simple no-library or single-library setting, ignoring the complex multi-library scenario in real-world applications. To address this limitation, we make the first attempt to introduce MLDebugging (Multi-Library Debugging), a comprehensive benchmark designed to assess debugging challenges within multi-library Python code. Specifically, MLDebugging encompasses 126 distinct Python libraries, covering a wide range of multi-library code issues, categorized into seven distinct types. Furthermore, we conduct a thorough evaluation of MLDebugging using both mainstream open-source and closed-source LLMs and highlight that current LLMs still struggle to correctly perform code debugging across multi-library scenarios. We hope this work can uncover the potential of LLMs in multi-library debugging scenario and offer insights for future research.

Anthology ID:: 2025.findings-acl.305
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5866–5879
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.305/
DOI:
Bibkey:
Cite (ACL):: JinYang Huang, Xiachong Feng, Qiguang Chen, Hanjie Zhao, Zihui Cheng, Jiesong Bai, Jingxuan Zhou, Min Li, and Libo Qin. 2025. MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios. In Findings of the Association for Computational Linguistics: ACL 2025, pages 5866–5879, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios (Huang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.305.pdf

PDF Cite Search Fix data