Think Like You Execute: Verifiable Chain of Thought from Program Traces

Shailja Thakur, Vaibhav Saxena, Rohan Kulkarni, Shivdeep Singh, Parameswaran Selvam, Hiroshi Kanayama, Hima Patel


Abstract
Teaching language models to reason about code execution is still an open problem. Current synthetic Chain-of-Thought (CoT) training data often consists of plausible-sounding explanations generated by teacher models, not verifiable accounts of actual program behavior. This causes models to learn logically flawed reasoning patterns despite syntactic correctness.We address this by grounding CoT generation directly in program execution traces. Our pipeline instruments code to capture dynamic behavior, narrates execution traces into natural language, and actively verifies each rationale against the trace. We systematically create 54,000 execution-verified, bi-directional rationales that teach models to reason both forward (inputoutput) and backward (outputinput). Models fine-tuned on our verified data achieve substantial improvements, with a performance boost of +24.2 on LiveCodeBench-Exec, +22.3 on CruxEval-Output, and +21.1 on CruxEval-Input, demonstrating that verification quality directly determines both reasoning and code generation capabilities.
Anthology ID:
2026.acl-industry.53
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Yunyao Li, Georg Rehm, Mei Tu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
775–795
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.53/
DOI:
Bibkey:
Cite (ACL):
Shailja Thakur, Vaibhav Saxena, Rohan Kulkarni, Shivdeep Singh, Parameswaran Selvam, Hiroshi Kanayama, and Hima Patel. 2026. Think Like You Execute: Verifiable Chain of Thought from Program Traces. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 775–795, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Think Like You Execute: Verifiable Chain of Thought from Program Traces (Thakur et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.53.pdf