Abstract
Traditional approaches to semantic parsing (SP) work by training individual models for each available parallel dataset of text-meaning pairs. In this paper, we explore the idea of polyglot semantic translation, or learning semantic parsing models that are trained on multiple datasets and natural languages. In particular, we focus on translating text to code signature representations using the software component datasets of Richardson and Kuhn (2017b,a). The advantage of such models is that they can be used for parsing a wide variety of input natural languages and output programming languages, or mixed input languages, using a single unified model. To facilitate modeling of this type, we develop a novel graph-based decoding framework that achieves state-of-the-art performance on the above datasets, and apply this method to two other benchmark SP tasks.- Anthology ID:
- N18-1066
- Volume:
- Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
- Month:
- June
- Year:
- 2018
- Address:
- New Orleans, Louisiana
- Editors:
- Marilyn Walker, Heng Ji, Amanda Stent
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 720–730
- Language:
- URL:
- https://aclanthology.org/N18-1066
- DOI:
- 10.18653/v1/N18-1066
- Cite (ACL):
- Kyle Richardson, Jonathan Berant, and Jonas Kuhn. 2018. Polyglot Semantic Parsing in APIs. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 720–730, New Orleans, Louisiana. Association for Computational Linguistics.
- Cite (Informal):
- Polyglot Semantic Parsing in APIs (Richardson et al., NAACL 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/N18-1066.pdf
- Code
- yakazimir/Code-Datasets + additional community code