Across Programming Language Silos: A Study on Cross-Lingual Retrieval-Augmented Code Generation

Qiming Zhu, Jialun Cao, Xuanang Chen, Weili Zhang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, Shing-Chi Cheung


Abstract
Current research on large language models (LLMs) with retrieval-augmented code generation (RACG) has largely focused on single-language settings, leaving their cross-lingual effectiveness underexplored. Multilingual RACG systems are increasingly important for migrating and reusing code across programming languages (PLs), a common yet challenging task in modern software development. To systematically study cross-lingual code knowledge transfer in RACG, we construct a dataset covering 13 PLs with nearly 14K instances. Our experiments reveal three key insights: (1) Knowledge transfer in RACG across PLs is non-trivial even using direct injection. (2) RACG exhibits unequal cross-lingual knowledge transfer, and its efficacy depends on linguistic affinity of PL pair and diversity of LLM pretraining corpus. (3) RACG shows limited reliance on natural language information embedded in code when equipped with a code-specific retriever. These findings provide practical guidance for designing effective multilingual RACG systems. https://github.com/icip-cas/Cross-Lingual-RACG
Anthology ID:
2026.findings-acl.1216
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24283–24296
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1216/
DOI:
Bibkey:
Cite (ACL):
Qiming Zhu, Jialun Cao, Xuanang Chen, Weili Zhang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, and Shing-Chi Cheung. 2026. Across Programming Language Silos: A Study on Cross-Lingual Retrieval-Augmented Code Generation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 24283–24296, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Across Programming Language Silos: A Study on Cross-Lingual Retrieval-Augmented Code Generation (Zhu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1216.pdf
Checklist:
 2026.findings-acl.1216.checklist.pdf