Enhancing Language Model with Unit Test Techniques for Efficient Regular Expression Generation

Chenhui Mao, Xiexiong Lin, Xin Jin, Xin Zhang


Abstract
Recent research has investigated the use of generative language models to produce regular expressions with semantic-based approaches. However, these approaches have shown shortcomings in practical applications, particularly in terms of functional correctness, which refers to the ability to reproduce the intended function inputs by the user. To address this issue, we present a novel method called Unit-Test Driven Reinforcement Learning (UTD-RL). Our approach differs from previous methods by taking into account the crucial aspect of functional correctness and transforming it into a differentiable gradient feedback using policy gradient techniques. In which functional correctness can be evaluated through Unit Tests, a testing method that ensures regular expressions meets its design and performs as intended. Experiments conducted on three public datasets demonstrate the effectiveness of the proposed method in generating regular expressions. This method has been employed in a regulatory scenario where regular expressions can be utilized to ensure that all online content is free from non-compliant elements, thereby significantly reducing the workload of relevant personnel.
Anthology ID:
2023.emnlp-industry.2
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2023
Address:
Singapore
Editors:
Mingxuan Wang, Imed Zitouni
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–19
Language:
URL:
https://aclanthology.org/2023.emnlp-industry.2
DOI:
10.18653/v1/2023.emnlp-industry.2
Bibkey:
Cite (ACL):
Chenhui Mao, Xiexiong Lin, Xin Jin, and Xin Zhang. 2023. Enhancing Language Model with Unit Test Techniques for Efficient Regular Expression Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 12–19, Singapore. Association for Computational Linguistics.
Cite (Informal):
Enhancing Language Model with Unit Test Techniques for Efficient Regular Expression Generation (Mao et al., EMNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2023.emnlp-industry.2.pdf
Video:
 https://preview.aclanthology.org/naacl-24-ws-corrections/2023.emnlp-industry.2.mp4