Zeqi Lin


2022

pdf
When Language Model Meets Private Library
Daoguang Zan | Bei Chen | Zeqi Lin | Bei Guan | Wang Yongji | Jian-Guang Lou
Findings of the Association for Computational Linguistics: EMNLP 2022

With the rapid development of pre-training techniques, a number of language models have been pre-trained on large-scale code corpora and perform well in code generation. In this paper, we investigate how to equip pre-trained language models with the ability of code generation for private libraries. In practice, it is common for programmers to write code using private libraries. However, this is a challenge for language models since they have never seen private APIs during training. Motivated by the fact that private libraries usually come with elaborate API documentation, we propose a novel framework with two modules: the APIRetriever finds useful APIs, and then the APICoder generates code using these APIs. For APIRetriever, we present a dense retrieval system and also design a friendly interaction to involve uses. For APICoder, we can directly use off-the-shelf language models, or continually pre-train the base model on a code corpus containing API information. Both modules are trained with data from public libraries and can be generalized to private ones. Furthermore, we craft three benchmarks for private libraries, named TorchDataEval, MonkeyEval, and BeatNumEval. Experimental results demonstrate the impressive performance of our framework.

pdf
Reasoning Like Program Executors
Xinyu Pi | Qian Liu | Bei Chen | Morteza Ziyadi | Zeqi Lin | Qiang Fu | Yan Gao | Jian-Guang Lou | Weizhu Chen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Reasoning over natural language is a long-standing goal for the research community. However, studies have shown that existing language models are inadequate in reasoning. To address the issue, we present POET, a novel reasoning pre-training paradigm. Through pre-training language models with programs and their execution results, POET empowers language models to harvest the reasoning knowledge possessed by program executors via a data-driven approach. POET is conceptually simple and can be instantiated by different kinds of program executors. In this paper, we showcase two simple instances POET-Math and POET-Logic, in addition to a complex instance, POET-SQL. Experimental results on six benchmarks demonstrate that POET can significantly boost model performance in natural language reasoning, such as numerical reasoning, logical reasoning, and multi-hop reasoning. POET opens a new gate on reasoning-enhancement pre-training, and we hope our analysis would shed light on the future research of reasoning like program executors.

2021

pdf
Learning Algebraic Recombination for Compositional Generalization
Chenyao Liu | Shengnan An | Zeqi Lin | Qian Liu | Bei Chen | Jian-Guang Lou | Lijie Wen | Nanning Zheng | Dongmei Zhang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021