Abstract
Considering the seq2seq architecture of Yin and Neubig (2018) for natural language to code translation, we identify four key components of importance: grammatical constraints, lexical preprocessing, input representations, and copy mechanisms. To study the impact of these components, we use a state-of-the-art architecture that relies on BERT encoder and a grammar-based decoder for which a formalization is provided. The paper highlights the importance of the lexical substitution component in the current natural language to code systems.- Anthology ID:
- 2022.findings-acl.173
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2022
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Smaranda Muresan, Preslav Nakov, Aline Villavicencio
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2204–2214
- Language:
- URL:
- https://aclanthology.org/2022.findings-acl.173
- DOI:
- 10.18653/v1/2022.findings-acl.173
- Cite (ACL):
- Nathanaël Beau and Benoit Crabbé. 2022. The impact of lexical and grammatical processing on generating code from natural language. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2204–2214, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- The impact of lexical and grammatical processing on generating code from natural language (Beau & Crabbé, Findings 2022)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2022.findings-acl.173.pdf
- Code
- codegenfactors/BertranX + additional community code
- Data
- CoNaLa, Django