Program-of-Thought Reveals LLM Abstraction Ceilings

Mike Zhou; Fenil Bardoliya; Vivek Gupta; Dan Roth

Program-of-Thought Reveals LLM Abstraction Ceilings

Mike Zhou, Fenil Bardoliya, Vivek Gupta, Dan Roth

Abstract

Large language models (LLMs) are often claimed to exhibit reasoning ability when supervised with chain-of-thought (CoT) traces. True reasoning, however, requires invariance: isomorphic problems should yield identical solutions regardless of superficial variation. We test this property by evaluating base and reasoning-optimized models—including LLaMA, Mistral, Qwen, GPT-OSS, and Deepseek—on isomorphic variants from GSM8K and MATH. All models exhibit substantial accuracy drops under perturbation. To assess whether training can induce invariance, we fine-tune models with Program-of-Thought (PoT) supervision under concrete and masked formulations. PoT fine-tuning increases behavioral cross-variant consistency but does not significantly reduce the accuracy gap, and these gains fail to transfer across prompting formats and domains. Our central finding is that models converge toward stable but systematically incorrect behaviors: consistency without correctness. This dissociation suggests that current reasoning supervision teaches models to reproduce solution templates rather than to abstract mathematical structure.

Anthology ID:: 2026.findings-eacl.257
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4911–4919
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.257/
DOI:
Bibkey:
Cite (ACL):: Mike Zhou, Fenil Bardoliya, Vivek Gupta, and Dan Roth. 2026. Program-of-Thought Reveals LLM Abstraction Ceilings. In Findings of the Association for Computational Linguistics: EACL 2026, pages 4911–4919, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Program-of-Thought Reveals LLM Abstraction Ceilings (Zhou et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.257.pdf
Checklist:: 2026.findings-eacl.257.checklist.pdf

PDF Cite Search Checklist Fix data