Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification

Poon Tsz Nok, Antonio Valerio Miceli Barone


Abstract
We introduce a self-play framework for semantic equivalence in Haskell, utilizing formal verification to guide adversarial training between a generator and an evaluator. The framework leverages Liquid Haskell proofs for validating equivalence and execution-based counterexamples for inequivalence, organized via a difficulty-aware curriculum. To facilitate this, we release OpInstruct-HSx, a synthetic dataset of 28k validated Haskell programs. Empirical experiments show that our evaluator transfers effectively to downstream tasks, achieving up to 13.3pp accuracy gain on EquiBench and consistent gains on PySecDB. Ablation studies on the SEQ-SINQ regimes indicate that while inequivalence supervision provides data volume, equivalence proofs are uniquely responsible for the model’s reasoning capabilities. The entire training pipeline and dataset are publicly released on GitHub and Hugging Face respectively.
Anthology ID:
2026.findings-acl.1615
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32273–32292
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1615/
DOI:
Bibkey:
Cite (ACL):
Poon Tsz Nok and Antonio Valerio Miceli Barone. 2026. Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification. In Findings of the Association for Computational Linguistics: ACL 2026, pages 32273–32292, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification (Nok & Barone, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1615.pdf
Checklist:
 2026.findings-acl.1615.checklist.pdf