Ge Li

2024

pdf abs
Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs
Zhihong Sun | Chen Lyu | Bolun Li | Yao Wan | Hongyu Zhang | Ge Li | Zhi Jin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large Language Models (LLMs) have recently made significant advances in code generation through the ‘Chain-of-Thought’ prompting technique. This technique empowers the model to autonomously devise “solution plans” to tackle intricate programming challenges, thereby improving its performance in code generation. Nevertheless, smaller models have been struggling to keep up with LLMs in deducing these plans, adversely affecting their code generation capabilities. Given the considerable size and associated deployment costs, along with concerns about data security, many teams opt for deploying smaller models for code generation. Consequently, there arises a compelling need for transferring LLMs’ code generation reasoning abilities to the smaller models. In this paper, we propose the CodePLAN framework, which aims to transfer LLMs’ reasoning capabilities to smaller models through distillation. We adopt a multi-task learning approach, jointly undertaking code generation and solution plan generation tasks, to enhance the code generation capabilities of smaller model. To ensure the superior quality of the solution plans, we advocate for the utilization of backward reasoning and plan sampling strategies. Our experiments show that in comparison to the conventional fine-tuning approach, our approach improves the smaller model’s code generation performance (measured in pass@1 metric) by over 130% on the challenging APPS benchmark.

2023

pdf abs
Self-Edit: Fault-Aware Code Editor for Code Generation
Kechi Zhang | Zhuo Li | Jia Li | Ge Li | Zhi Jin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) have demonstrated an impressive ability to generate codes on competitive programming tasks. However, with limited sample numbers, LLMs still suffer from poor accuracy. Inspired by the process of human programming, we propose a generate-and-edit approach named Self-Edit that utilizes execution results of the generated code from LLMs to improve the code quality on the competitive programming task. We execute the generated code on the example test case provided in the question and wrap execution results into a supplementary comment. Utilizing this comment as guidance, our fault-aware code editor is employed to correct errors in the generated code. We perform extensive evaluations across two competitive programming datasets with nine different LLMs. Compared to directly generating from LLMs, our approach can improve the average of pass@1 by 89% on APPS-dev, 31% on APPS-test, and 48% on HumanEval over nine popular code generation LLMs with parameter sizes ranging from 110M to 175B. Compared to other post-processing methods, our method demonstrates superior accuracy and efficiency.

2022

pdf abs
Rethinking Positional Encoding in Tree Transformer for Code Representation
Han Peng | Ge Li | Yunfei Zhao | Zhi Jin
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Transformers are now widely used in code representation, and several recent works further develop tree Transformers to capture the syntactic structure in source code. Specifically, novel tree positional encodings have been proposed to incorporate inductive bias into Transformer.In this work, we propose a novel tree Transformer encoding node positions based on our new description method for tree structures.Technically, local and global soft bias shown in previous works is both introduced as positional encodings of our Transformer model.Our model finally outperforms strong baselines on code summarization and completion tasks across two languages, demonstrating our model’s effectiveness.Besides, extensive experiments and ablation study shows that combining both local and global paradigms is still helpful in improving model performance. We release our code at https://github.com/AwdHanPeng/TreeTransformer.

2016

pdf
Compressing Neural Language Models by Sparse Word Representations
Yunchuan Chen | Lili Mou | Yan Xu | Ge Li | Zhi Jin
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Nowadays, neural networks play an important role in the task of relation classification. By designing different neural architectures, researchers have improved the performance to a large extent in comparison with traditional methods. However, existing neural networks for relation classification are usually of shallow architectures (e.g., one-layer convolutional neural networks or recurrent networks). They may fail to explore the potential representation space in different abstraction levels. In this paper, we propose deep recurrent neural networks (DRNNs) for relation classification to tackle this challenge. Further, we propose a data augmentation method by leveraging the directionality of relations. We evaluated our DRNNs on the SemEval-2010 Task 8, and achieve an F1-score of 86.1%, outperforming previous state-of-the-art recorded results.

pdf abs
Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation
Lili Mou | Yiping Song | Rui Yan | Ge Li | Lu Zhang | Zhi Jin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Using neural networks to generate replies in human-computer dialogue systems is attracting increasing attention over the past few years. However, the performance is not satisfactory: the neural network tends to generate safe, universally relevant replies which carry little meaning. In this paper, we propose a content-introducing approach to neural network-based generative dialogue systems. We first use pointwise mutual information (PMI) to predict a noun as a keyword, reflecting the main gist of the reply. We then propose seq2BF, a “sequence to backward and forward sequences” model, which generates a reply containing the given keyword. Experimental results show that our approach significantly outperforms traditional sequence-to-sequence models in terms of human evaluation and the entropy measure, and that the predicted keyword can appear at an appropriate position in the reply.