Xiaoyu Hu


2022

pdf
Code Generation From Flowcharts with Texts: A Benchmark Dataset and An Approach
Zejie Liu | Xiaoyu Hu | Deyu Zhou | Lin Li | Xu Zhang | Yanzheng Xiang
Findings of the Association for Computational Linguistics: EMNLP 2022

Currently, researchers focus on generating codes from the requirement documents. However, current approaches still perform poorly on some requirements needing complex problem-solving skills. In reality, to tackle such complex requirements, instead of directly translating requirement documents into codes, software engineers write codes via unified modeling language diagrams, such as flowcharts, an intermediate tool to analyze and visualize the system. Therefore, we propose a new source code generation task, that is, to generate source code from flowcharts with texts. We manually construct a benchmark dataset containing 320 flowcharts with their corresponding source codes. Obviously, it is not straightforward to employ the current approaches for the new source code generation task since (1) the flowchart is a graph that contains various structures, including loop, selection, and others which is different from texts; (2) the connections between nodes in the flowchart are abundant and diverse which need to be carefully handled. To solve the above problems, we propose a two-stage code generation model. In the first stage, a structure recognition algorithm is employed to transform the flowchart into pseudo-code containing the structural conventions of a typical programming language such as while, if. In the second stage, a code generation model is employed to convert the pseudo-code into code. Experimental results show that the proposed approach can achieve some improvement over the baselines.