LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation

Sachit Kuhar; Wasi Ahmad; Zijian Wang; Nihal Jain; Haifeng Qian; Baishakhi Ray; Murali Krishna Ramanathan; Xiaofei Ma; Anoop Deoras

doi:10.18653/v1/2025.naacl-long.348

LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation

Sachit Kuhar, Wasi Uddin Ahmad, Zijian Wang, Nihal Jain, Haifeng Qian, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras

Abstract

Recent advancements in code completion models have primarily focused on local file contexts. However, these studies do not fully capture the complexity of real-world software development, which often requires the use of rapidly-evolving public libraries. To address this gap, we introduce LibEvolutionEval, a comprehensive study that emphasizes the need to understand library evolution to perform accurate in-line code completions. LibEvolutionEvaloffers a version-specific code-completion task across eight libraries as they evolve over the years, along with an in-depth analysis of the evolution of two widely used and well-maintained public libraries: PyTorch and Matplotlib. We evaluate several popular models and find that public library evolution significantly affects their performance. To mitigate this, we explored how retrieving version-specific library documentation and prompt-based techniques can enhance model capability in dealing with these fast-evolving packages. This suggests a promising path forward for better handling fast-evolving libraries. Our tasks will be made publicly available upon acceptance.

Anthology ID:: 2025.naacl-long.348
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6826–6840
Language:
URL:: https://preview.aclanthology.org/Author-Pages-WenzhengZhang-ZhengyanShi-ShuYang/2025.naacl-long.348/
DOI:: 10.18653/v1/2025.naacl-long.348
Bibkey:
Cite (ACL):: Sachit Kuhar, Wasi Uddin Ahmad, Zijian Wang, Nihal Jain, Haifeng Qian, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, and Anoop Deoras. 2025. LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6826–6840, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation (Kuhar et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/Author-Pages-WenzhengZhang-ZhengyanShi-ShuYang/2025.naacl-long.348.pdf

PDF Cite Search Fix data