Think Earlier, Not Longer: Prompt Optimization via Reducing Unhealthy Exploration

Ling-I Wu, Minyu Chen, Jingyang Li, Xi Chang, Guoqiang Li


Abstract
While large language models exhibit strong reasoning capabilities, prior work shows that their performance can be further enhanced by encouraging greater exploration. However, existing approaches overlook the presence of unhealthy exploration that increases exploration-related token usage without contributing to effective problem-solving. In this work, we show that prompt ambiguity can artificially prolong early-stage exploration, manifested as an elevated and delayed early-stage entropy peak. Although this uncertainty may be gradually resolved as reasoning progresses, reflected in the eventual convergence of the late-stage entropy peak, it does not meaningfully improve accuracy or self-consistency and instead substantially reduces reasoning efficiency. Motivated by these observations, we propose an entropy-dynamics-aware prompt optimization framework that trains a lightweight optimizer to generate concise clarifications. These clarifications aim to reduce ambiguity-induced early-stage uncertainty while preserving the model’s reasoning capabilities. Extensive experiments across multiple models, reasoning budgets, and benchmarks demonstrate that our approach consistently improves reasoning efficiency by up to 52%, by reducing unhealthy exploration without sacrificing accuracy.
Anthology ID:
2026.findings-acl.817
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16577–16592
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.817/
DOI:
Bibkey:
Cite (ACL):
Ling-I Wu, Minyu Chen, Jingyang Li, Xi Chang, and Guoqiang Li. 2026. Think Earlier, Not Longer: Prompt Optimization via Reducing Unhealthy Exploration. In Findings of the Association for Computational Linguistics: ACL 2026, pages 16577–16592, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Think Earlier, Not Longer: Prompt Optimization via Reducing Unhealthy Exploration (Wu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.817.pdf
Checklist:
 2026.findings-acl.817.checklist.pdf