Introducing Compiler Semantics into Large Language Models as Programming Language Translators: A Case Study of C to x86 Assembly
Shuoming Zhang, Jiacheng Zhao, Chunwei Xia, Zheng Wang, Yunji Chen, Huimin Cui
Abstract
Compilers are complex software containing millions of lines of code, taking years to develop. This paper investigates to what extent Large Language Models (LLMs) can replace hand-crafted compilers in translating high-level programming languages to machine instructions, using C to x86 assembly as a case study. We identify two challenges of using LLMs for code translation and introduce two novel data pre-processing techniques to address the challenges: numerical value conversion and training data resampling. While only using a 13B model, our approach achieves a behavioral accuracy of over 91%, outperforming the much larger GPT-4 Turbo model by over 50%. Our results are encouraging, showing that LLMs have the potential to transform how compilation tools are constructed.- Anthology ID:
- 2024.findings-emnlp.55
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 996–1011
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.55/
- DOI:
- 10.18653/v1/2024.findings-emnlp.55
- Cite (ACL):
- Shuoming Zhang, Jiacheng Zhao, Chunwei Xia, Zheng Wang, Yunji Chen, and Huimin Cui. 2024. Introducing Compiler Semantics into Large Language Models as Programming Language Translators: A Case Study of C to x86 Assembly. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 996–1011, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Introducing Compiler Semantics into Large Language Models as Programming Language Translators: A Case Study of C to x86 Assembly (Zhang et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.55.pdf