Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks

Shlok Shelat, Jay Raval, Souvik Roy, Manas Gaur


Abstract
Large language models (LLMs) have demonstrated strong performance on formal language tasks, yet whether this reflects genuine symbolic reasoning or pattern matching on familiar constructions remains unclear. We introduce a benchmark for deterministic finite automata (DFA) construction from regular languages, comprising factual knowledge questions, seen construction problems from public sources, and two types of unseen problems: hand-crafted instances with multiple interacting constraints and systematically generated problems via Arden’s theorem. Models achieve perfect accuracy on factual questions and 84-90% on seen tasks. However, accuracy drops sharply on unseen problems (by 30-64%), with failures stemming from systematic misinterpretation of language constraints, incorrect handling of Kleene-star semantics, and a failure to preserve global consistency. We evaluate a three-stage hint protocol that enables correction of shallow errors but does not reliably resolve globally inconsistent or structurally flawed automata. Our analysis across multiple prompting strategies (direct, Chain-of-Thought, Tree-of-Thought) reveals that errors persist regardless of prompting approach, exposing a fundamental gap between LLMs’ ability to generate syntactically plausible DFAs and their capacity for semantically correct formal reasoning.
Anthology ID:
2026.findings-acl.1495
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
29904–29934
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1495/
DOI:
Bibkey:
Cite (ACL):
Shlok Shelat, Jay Raval, Souvik Roy, and Manas Gaur. 2026. Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks. In Findings of the Association for Computational Linguistics: ACL 2026, pages 29904–29934, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks (Shelat et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1495.pdf
Checklist:
 2026.findings-acl.1495.checklist.pdf