Better Language Models of Code through Self-Improvement

Hung To; Nghi Bui; Jin L.C. Guo; Tien Nguyen

doi:10.18653/v1/2023.findings-acl.823

Better Language Models of Code through Self-Improvement

Hung To, Nghi Bui, Jin L.C. Guo, Tien Nguyen

Abstract

Pre-trained language models for code (PLMCs) have gained attention in recent research. These models are pre-trained on large-scale datasets using multi-modal objectives. However, fine-tuning them requires extensive supervision and is limited by the size of the dataset provided. We aim to improve this issue by proposing a data augmentation framework using knowledge distillation. Our framework utilizes knowledge gained during the pre-training and fine-tuning stage to augment training data, which is then used for the next step. We incorporate this framework into the state-of-the-art language models, such as CodeT5, CodeBERT, and UnixCoder. The results show that our framework significantly improves PLMCs’ performance in sequence-generation tasks, such as code summarization and code generation in the CodeXGLUE benchmark.

Anthology ID:: 2023.findings-acl.823
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12994–13002
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.findings-acl.823/
DOI:: 10.18653/v1/2023.findings-acl.823
Bibkey:
Cite (ACL):: Hung To, Nghi Bui, Jin L.C. Guo, and Tien Nguyen. 2023. Better Language Models of Code through Self-Improvement. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12994–13002, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Better Language Models of Code through Self-Improvement (To et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.findings-acl.823.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.findings-acl.823.mp4

PDF Cite Search Video Fix data