A Knapsack by Any Other Name: Presentation impacts LLM performance on NP-hard problems

Alex Duchnowski; Ellie Pavlick; Alexander Koller

doi:10.18653/v1/2025.findings-emnlp.352

A Knapsack by Any Other Name: Presentation impacts LLM performance on NP-hard problems

Alex Duchnowski, Ellie Pavlick, Alexander Koller

Abstract

To investigate the effect of problem presentation on LLMs’ ability to solve optimization problems, we introduce the dataset of Everyday Hard Optimization Problems (EHOP), a collection of NP-hard problems expressed in natural language. EHOP includes problem formulations that could be found in computer science textbooks (e.g., graph coloring), versions that are dressed up as problems that could arise in real life (e.g., party planning), and variants with inverted rules. We find that state-of-the-art LLMs, across multiple prompting strategies, systematically solve textbook problems more accurately than their real-life and inverted counterparts. While reasoning models are more capable, they nonetheless show high variance across problem presentations, suggesting they lack a truly robust reasoning mechanism. We argue that this constitutes evidence that LLMs are still heavily dependent on what was seen in training and struggle to generalize to novel problems.

Anthology ID:: 2025.findings-emnlp.352
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6628–6651
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.352/
DOI:: 10.18653/v1/2025.findings-emnlp.352
Bibkey:
Cite (ACL):: Alex Duchnowski, Ellie Pavlick, and Alexander Koller. 2025. A Knapsack by Any Other Name: Presentation impacts LLM performance on NP-hard problems. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 6628–6651, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: A Knapsack by Any Other Name: Presentation impacts LLM performance on NP-hard problems (Duchnowski et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.352.pdf
Checklist:: 2025.findings-emnlp.352.checklist.pdf

PDF Cite Search Checklist Fix data