Query4Regex: Verifiable Regex Transformation through Formal Operations from NL and DSL Queries

Joonghyuk Hahn, Yo-Sub Han


Abstract
While large language models (LLMs) excel at generating structured data, such as code, their ability to precisely manipulate it based on instructions remains relatively under-explored. Regular expressions (regexes), critical in practice, are challenging to manipulate. Crucially, the correctness of transformations can be mathematically verified, making them exceptionally well-suited for measuring the symbolic reasoning of LLMs. We introduce Query4Regex, a new benchmark for evaluating verifiable transformations on regexes. Our benchmark tests two query formats: natural language instructions and a program-like domain-specific language (DSL) that specifies the sequence of operations. We evaluate a range of LLMs, verifying semantic correctness through rigorous deterministic finite automata (DFA) equivalence testing. Our empirical studies reveal: 1) the formal DSL significantly outperforms natural language, achieving up to 6.74%p accuracy gains on average. 2) Performance for both formats degrades sharply as compositional complexity increases, highlighting a core challenge in multi-step reasoning. 3) Models often generate plausible but unparsable outputs. Even among parsable outputs, semantic errors remain common, making failures difficult to detect without formal verification. Query4Regex provides a robust framework for analyzing the gap between LLMs’ linguistic fluency and their symbolic reasoning, paving the way for more reliable and verifiable manipulation of formal languages. Our code is available at https://github.com/peer0/Query4Regex.
Anthology ID:
2026.findings-eacl.331
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6297–6305
Language:
URL:
https://preview.aclanthology.org/manual-author-scripts/2026.findings-eacl.331/
DOI:
Bibkey:
Cite (ACL):
Joonghyuk Hahn and Yo-Sub Han. 2026. Query4Regex: Verifiable Regex Transformation through Formal Operations from NL and DSL Queries. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6297–6305, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Query4Regex: Verifiable Regex Transformation through Formal Operations from NL and DSL Queries (Hahn & Han, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/manual-author-scripts/2026.findings-eacl.331.pdf
Checklist:
 2026.findings-eacl.331.checklist.pdf