The impact of lexical and grammatical processing on generating code from natural language

Nathanaël Beau, Benoit Crabbé


Abstract
Considering the seq2seq architecture of Yin and Neubig (2018) for natural language to code translation, we identify four key components of importance: grammatical constraints, lexical preprocessing, input representations, and copy mechanisms. To study the impact of these components, we use a state-of-the-art architecture that relies on BERT encoder and a grammar-based decoder for which a formalization is provided. The paper highlights the importance of the lexical substitution component in the current natural language to code systems.
Anthology ID:
2022.findings-acl.173
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2204–2214
Language:
URL:
https://aclanthology.org/2022.findings-acl.173
DOI:
10.18653/v1/2022.findings-acl.173
Bibkey:
Cite (ACL):
Nathanaël Beau and Benoit Crabbé. 2022. The impact of lexical and grammatical processing on generating code from natural language. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2204–2214, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
The impact of lexical and grammatical processing on generating code from natural language (Beau & Crabbé, Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.findings-acl.173.pdf
Video:
 https://preview.aclanthology.org/auto-file-uploads/2022.findings-acl.173.mp4
Code
 codegenfact/BertranX
Data
CoNaLaDjango