SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications
Zexuan Zhong, Jiaqi Guo, Wei Yang, Jian Peng, Tao Xie, Jian-Guang Lou, Ting Liu, Dongmei Zhang
Abstract
Recent research proposes syntax-based approaches to address the problem of generating programs from natural language specifications. These approaches typically train a sequence-to-sequence learning model using a syntax-based objective: maximum likelihood estimation (MLE). Such syntax-based approaches do not effectively address the goal of generating semantically correct programs, because these approaches fail to handle Program Aliasing, i.e., semantically equivalent programs may have many syntactically different forms. To address this issue, in this paper, we propose a semantics-based approach named SemRegex. SemRegex provides solutions for a subtask of the program-synthesis problem: generating regular expressions from natural language. Different from the existing syntax-based approaches, SemRegex trains the model by maximizing the expected semantic correctness of the generated regular expressions. The semantic correctness is measured using the DFA-equivalence oracle, random test cases, and distinguishing test cases. The experiments on three public datasets demonstrate the superiority of SemRegex over the existing state-of-the-art approaches.- Anthology ID:
- D18-1189
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1608–1618
- Language:
- URL:
- https://aclanthology.org/D18-1189
- DOI:
- 10.18653/v1/D18-1189
- Cite (ACL):
- Zexuan Zhong, Jiaqi Guo, Wei Yang, Jian Peng, Tao Xie, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2018. SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1608–1618, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications (Zhong et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/D18-1189.pdf