@inproceedings{lee-huang-2026-solidcoder,
title = "{S}olid{C}oder: Bridging the Mental-Reality Gap in {LLM} Code Generation through Concrete Execution",
author = "Lee, Woojin and
Huang, Jin-Xia",
editor = "Liakata, Maria and
Moreira, Viviane P. and
Zhang, Jiajun and
Jurgens, David",
booktitle = "Findings of the {A}ssociation for {C}omputational {L}inguistics: {ACL} 2026",
month = jul,
year = "2026",
address = "San Diego, California, United States",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-acl/2026.findings-acl.361/",
pages = "7294--7316",
ISBN = "979-8-89176-395-1",
abstract = "State-of-the-art code generation frameworks rely on mental simulation, where LLMs internally trace execution to verify correctness. We expose a fundamental limitation: the Mental-Reality Gap{---}where models hallucinate execution traces and confidently validate buggy code. This gap manifests along two orthogonal dimensions: the Specification Gap (overlooking edge cases during planning) and the Verification Gap (hallucinating correct behavior for flawed code). We propose SolidCoder with a simple principle: don{'}t imagine{---}execute. The S.O.L.I.D. architecture addresses both dimensions by forcing edge-case awareness before algorithm design and replacing imagined traces with sandboxed execution using property-based oracles. With GPT-4o, SolidCoder achieves state-of-the-art pass@1 performance: 95.7{\%} on HumanEval (+0.6{\%}p), 77.0{\%} on CodeContests (+4.3{\%}p), and 26.7{\%} on APPS (+3.4{\%}p). Ablation reveals that edge-case awareness provides the largest individual gain, while execution grounding catches categorically different errors that specification improvements cannot address. These gains generalize to RL post-trained models, validating that bridging both gap dimensions is essential for robust code synthesis. We release our code and framework to facilitate future research."
}Markdown (Informal)
[SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution](https://preview.aclanthology.org/ingest-acl/2026.findings-acl.361/) (Lee & Huang, Findings 2026)
ACL