TengyueWang TengyueWang
2025
CODEMENV: Benchmarking Large Language Models on Code Migration
Keyuan Cheng
|
Xudong Shen
|
Yihao Yang
|
TengyueWang TengyueWang
|
Yang Cao
|
Muhammad Asif Ali
|
Hanbin Wang
|
Lijie Hu
|
Di Wang
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) have demonstrated remarkable proficiency in handling a wide range of tasks within the software engineering domain, but their ability to perform code migration—adapting code to different environments—remains underexplored. In this work, we propose a novel benchmark, : Code Migration Across Environment, designed to evaluate LLMs’ performance in handling code migration tasks. The benchmark comprises 922 data points across 19 Python and Java packages, offering three tasks to systematically evaluate code migration: identifying version-incompatible functions, determining function changes, and adapting code to target environments. Experimental evaluation of across seven LLMs revealed an average pass@1 rate of 26.50%, with GPT-4o performing best at 43.84%. We highlight our key findings as follows: (i) LLMs are more familiar with newer function versions, making them better at migrating legacy code, and (ii) a logical inconsistency where LLMs sometimes identify irrelevant function changes for the target migration environment.
Search
Fix author
Co-authors
- Muhammad Asif Ali 1
- Yang Cao 1
- Keyuan Cheng 1
- Lijie Hu 1
- Xudong Shen 1
- show all...